1 Statistics For Microarray

1.1 About The Microarray Experiment

1.1.1 Overview

I will be focusing on the physical design of a microarray experiment. The experimental set-up is the proper planning of sequential and sometimes parallel activities for the smooth running of the scientific discovery process. What is the microarray experiment about? It involves an experiment of the preterm birth, whereby the main aim was to identify which genes are being expressed by a population of cells or tissues, by analyzing its expressed messenger ribonucleic acids (mRNAs).

1.1.2 About The Deoxyribonucleic & Ribonucleic Acids

The genome in the nucleus of a eukaryote contains the instructions for the activity of a cell. These instructions are first transcribed into RNA and then finally translated into proteins using a four-letter alphabet consisting of nucleotides. The microarray is particularly suitable for measuring the transcription levels of different genes in different cells or conditions. The experimental set-up of a microarray experiment is a long and intricate process that involves several steps.

DNA is described as a double helix. It looks like a twisted long ladder. The sides of the ‘ladder’ are formed by a backbone of sugar and phosphate molecules, and the ‘crosspieces’ consist of two nucleotide bases joined weakly in the middle by hydrogen bonds. On either side of the ‘rungs’ lie complementary bases. Every Adenine base (A) is flanked by a Thymine (T) base, whereas every Guanine base (G) has a Cytosine partner (C) on the other side. Therefore, the strands of the helix are each other’s complement. It is this basic chemical fact of complementarity that lies at the basis of each microarray.

Microarrays have many single strands of a gene sequence segment attached to their surface, known as probes. This attachment is sometimes achieved by physically spotting them on the array and sometimes by immobilizing them to the quartz wafer surface via hydroxylation, as in Affymetrix arrays. In the future, undoubtedly, other media will become available. The single strands are waiting for complementary strands to bond hybridize and stick to the surface of the array. RNA delivers DNA’s genetic message to the cytoplasm of a cell where proteins are made. Chemically speaking, RNA is similar to a single strand of DNA.

The purpose of a microarray is to measure for each gene in the genome the amount of message that was broadcast through the RNA in the case compared to the control sample. Roughly speaking, color-labelled RNA is applied to the microarray, and if the RNA finds its complementary sibling on the array, then it naturally binds and sticks to the array. By measuring the amount of color emitted by the array, one can get a sense of how much RNA was produced for each gene (see the below figure).


1.1.3 About The Complementary DNA (cDNA)

Although some microarray experiments have been performed using RNA directly, most scientists prefer to work with the more stable cDNA molecule, which is the inverse copy of RNA. It is produced by a little inverse copy machine called an enzyme. It acts by copying a T for each A, an A for each T or U, a C for each G and a G for each C. In this way, it creates the inverse image of usual RNA. This latter is a single strand of nucleotides, that is, effectively a string of four letters, A, U, G and C. Naturally, these letters tend to bind to T, A, C and G, respectively, when they are present. This is the principle behind microarray technology. To avoid RNA binding to itself, it is heated up to 65 degrees celsius. In this way, any self-hybridizations of the RNA that has taken place is undone. After 5 min, the tubes are quickly cooled by putting them into ice for 2 minutes. In this way, any rehybridization of the RNA is prevented because the temperature is too low.

In all microarray experiments, it is essential that the biological material should be selected under controlled conditions. Failure to do so might increase spurious variation as a result of some uncontrolled nuisance factor. A total of 100 micrograms of RNA should be obtained for both samples. This quantity is sufficient for a single microarray. Modern techniques allow one to do the experiment even with a smaller quantity of RNA.

In this figure, I obtain a copy of the double stranded cDNA from the original mRNA that was purifued and lysed from cells.


1.1.4 Obtaining The Labelled cDNA

The end product of a microarray experiment is an image with gene spots of varying intensity for each of the treatment and the control samples. A smart way of making genes ‘visible’ has been developed by adding a dye to the cDNA so that the amount of cDNA that sticks to the microarray slide could be measured via an optical scanner. In order to build cDNA, the enzyme needs the nucleotide building blocks, A, T, G and C. Rather than adding 25 microliters of plain C nucleotides, the idea is to add 23 microliters of C’s that have a dye molecule attached to them, as well as 2 microliters of plain C’s. Each time the enzyme needs a C to copy a G, it will most likely use one with a dye molecule attached. Therefore, the number of dye molecules present in the cDNA is proportional to the number of G’s in the RNA, which is roughly proportional to the number of transcribed copies of the gene, as well as the length of the transcript.

Two different dyes Cy3 and Cy5 are used to distinguish treatment and control samples. The reverse transcription can finally start when a reverse transcriptase is added. The enzyme is stored at low temperatures because it degrades quickly at higher temperatures. It performs its best RNA copying activity, however, when it is put in a 42 degrees celsius environment. The control and treatment samples are brought up to this temperature immediately prior to adding the enzyme. Then the enzyme immediately starts its job. If enzyme degradation goes faster than normal, it is sometimes advisable to add some additional enzyme after one hour. Otherwise, the enzyme is allowed to do its job for a total of two hours. At this time, enough cDNA is produced to be applied to the microarray.

1.1.5 Preparing The cDNA Mixture For Hybridization

Not all the loose bases that were added into the RNA sample have been reverse transcribed into cDNA. These loose bases could possibly hybridize spuriously with the immobilized DNA on the array, and it is therefore sensible to filter them out. The mixture is passed through a membrane. The long strands of cDNA stick to the membrane, whereas the loose bases pass through. Possibly, also the RNA sticks to the membrane, but since it is not labelled, this is immaterial. By turning the membrane, the other way around, the labelled cDNA is recovered. The cDNA mixture is then dried down in a centrifuge in order to replace the liquid by a hybridization buffer. This hybridization buffer facilitates the kinetics of the actual hybridization, that is, the attachment of the cDNA produced from the sample of interest to the DNA material on the slide. So far, all the steps have been performed for the treatment and control sample side by side. At this point, the two samples are combined in, hopefully, exactly equal quantities. The resulting mixture is then ready for hybridization to a single microarray.

Note that the cDNA, when left at room temperature for a while, may start to fold onto itself if there are complementary strands in the cDNA sequence. This would inhibit hybridization to the array, and therefore steps have to be undertaken to avoid this folding while the microarray is being prepared. By heating the cDNA mixture to 85 degrees celsius for 5 minutes and then shock freezing it by putting it into ice, the self-folding of the cDNA is prevented.

1.1.6 Slide Hybridization

Before applying the cDNA sample, the microarray is washed with a mixture of Sodium Chloride and Sodium Citrate (SSC) and Sodium dodecyl Sulfate (SDS). Then with a pipette the cDNA mixture containing both control and treatment samples is applied to the slide. By putting a hydrophobic coverslip on top of the mixture, the cDNA is evenly spread over the microarray and any air bubbles disappear. The slide is then mounted in a hybridization chamber, fixed and put in a dark environment at 42 degrees celsius for 10 to 16 hours before visualization. During this time, the actual process of interest takes place: the cDNA, which was applied to the slide, binds with complementary strands on the array. The number of matches will eventually determine the color intensity of the scanned slide and give an indication of the amount of RNA transcript of that gene within the sample. After the hybridization process has been completed, it is essential to remove all the labelled cDNA that did not hybridize to the slide. If sufficient care is not taken in order to remove it, the dye that is attached to that cDNA may give a spurious signal. For that purpose, the microarray is washed several times with different concentrations of SSC. Finally, the microarray is dried and is ready to be scanned.

1.2 Affymetrix Microarray Workflow

1.2.1 Introduction

In this article, I will walk through an end-to-end Affymetrix microarray differential expression workflow using Bioconductor packages. This workflow is directly applicable to current gene type arrays. The data analyzed here is a typical clinical microarray data set that compares spontaneous preterm birth with term delivery. I will start from the raw data which is an ExpressionSet object (CEL files), show how to import them into a Bioconductor ExpressionSet, perform quality control and normalization and finally differential gene expression (DGE) analysis, followed by some enrichment analysis.

1.2.2 Background Data Information

The raw CEL files are produced by the array scanner software and contain the measured probe intensities.

Each dataset at ArrayExpress is stored according to the MAGE-TAB (MicroArray Gene Expression Tabular) specifications as a collection of tables bundled with the raw data. The MAGE-TAB format specifies up to five different types of files: Investigation Description Format (IDF) contains top level information about the experiment including title, description, submitter contact details and protocols. Array Design Format (ADF). Sample and Data Relationship Format (SDRF) contains essential information on the experimental samples. The ExpressionSet class is designed to combine several different sources of information into a single convenient structure. An ExpressionSet can be manipulated and is the input to or output of many Bioconductor functions.

1.2.3 The ExpressionSet Class

Before I’ll move on to the actual raw data import, I will briefly introduce the ExpressionSet class contained in the Biobase package. It is commonly used to store microarray data in Bioconductor. The ExpressionSet class is designed to combine several different sources of information into a single convenient structure.

The data in an ExpressionSet consist of:
1)AssayData: Expression data from microarray experiments with microarray probes in rows and sample identifiers in columns.

2)Metadata:
a)PhenoData: A description of the samples in the experiment with sample identifiers in rows and description elements in columns; holds the content of the SDRF file.
b)FeatureData: metadata about the features on the chip or technology used for the experiment with same rows as assayData by default and freely assignable columns.
c)Further annotations for the features

3)ExperimentData: A flexible structure to describe the experiment.

One should keep in mind that the rownames of the phenoData have to match the column names of the assay data, while the row names of the assay data have to match the row names of the featureData. This is illustrated in the following figure.


1.2.4 Pipeline Description

1.2.4.1 Data Importation

The analysis of affymetrix arrays starts with CEL files. These are the result of the processing of the raw image files containing estimated probe intensity values using the affymetrix software. To look at the data, I may use the pData() function from the Biobase package directly accesses the phenoData in the Eset.

1.2.4.2 Raw Data Quality Control

The expression intensity values are in the assayData sub-object exprs and can be accessed by the exprs() function. The rows represent the microarray probes while the columns represent one microarray. I’ll also represent the probe intensities via a boxplot graph with one box per individual microarray. Note that the boxplot() function, can take expression sets as argument. It accesses the expression data and performs a log2-transformation by default. I also draw histograms of some microarrays.

1.2.5 Robust Multi-array Average

The standard method for normalization is RMA. This latter is one of the few normalization methods that only uses the PM probes:
Background correction to correct for spatial variation within individual arrays: a background-corrected intensity is calculated for each PM probe in such a way that all background corrected intensities are positive.
Log transformation to improve the distribution of the data: the base-2 logarithm of the background corrected intensity is calculated for each probe. The log transformation will make the data less skewed and more normally distributed and provide an equal spread of up and down regulated expression ratios.
Quantile normalization to correct for variation between the arrays: equalizes the data distributions of the arrays and make the samples completely comparable.
Probe normalization to correct for variation within probe sets: equalizes the behavior of the probes between the arrays and combines normalized data values of probes from a probe set into a single value for the whole probe set.


1.2.5.1 Linear Models

In order to analyze which genes are differentially expressed between SPTB (inclusing sPTD & PPROM) and TERM DELIVERY (Control), I’ll have to fit a linear model to our expression data. Linear models are the workhorse for the analysis of experimental data. They can be used to analyze almost arbitrarily complex designs; however, they also take a while to learn and understand and a thorough description is beyond the scope of this workflow.

Linear models for microarrays: I will now apply linear models to microarrays. Specifically, I’ll discuss how to use the limma package for differential expression analysis. The package is designed to analyze complex experiments involving comparisons between many experimental groups simultaneously while remaining reasonably easy to use for simple experiments. The main idea is to fit a linear model to the expression data for each gene.

Empirical Bayes and other methods are used to borrow information across genes for the residual variance estimation leading to moderated t-statistics and stabilizing the analysis for experiments with just a small number of arrays. In the following, I’ll be using appropriate design and contrast matrices for our linear models and fit a linear model to each gene separately.

1.2.5.2 Differential Gene Expression

Differential expression analysis means taking the normalised read count data and performing statistical analysis to discover quantitative changes in expression levels between experimental groups. The goal is to identify genes whose expression differs under different conditions. An important consideration for DGE analysis is correction for multiple testing.

1.2.5.3 Gene Set Enrichment Analysis

Functional enrichment analysis is a method to identify classes of genes or proteins that are over-represented in a large set of genes or proteins, and may have an association with disease phenotypes. The method uses statistical approaches to identify significantly enriched or depleted groups of genes.

1.2.5.4 Microarrays Limitations

Hybridisation-based approaches are high throughput and relatively inexpensive, but have several limitations which include:
a) the reliance upon existing knowledge about the genome sequence.
b) the high background levels owing to cross-hybridisation.
c) a limited dynamic range of detection owing to both background and saturation signals.
d) comparing expression levels across different experiments is often difficult and can require complicated normalisation methods.

2 List Of Packages

To analyze microarray data, I need a specific R package, called Bioconductor. However, Bioconductor uses functions and object from various other R packages, so I need to install several R packages. Additionally, I will need an R-package for making graphs of the data, called ggplot2. In order to use the installed R and BioConductor packages in R, I have to load them first.
Bioconductor is object-oriented R. It means that a package consists of classes. The classes define the behaviour and characteristics of a set of similar objects that belong to the class. The characteristics that objects of a class can have are called slots while the behaviour of the objects (the actions they can do) is described by the methods of a class.

library(Biobase) #package that contains functions needed for microarray data analysis.
library(oligo) #A package to analyze oligonucleotide arrays at probe-level. It currently supports Affymetrix (CEL files) and standardized data structures to represent genomic data.
library(limma) #Data analysis, linear models and differential expression for microarray data.

library(gplots) #plotting data.
library(ggplot2) #Create Elegant Data Visualisations Using the Grammar of Graphics.
library(ggcorrplot) #provides a solution for reordering the correlation matrix and displays the significance level on the plot. It also includes a function for computing a matrix of correlation p-values.

library(preprocessCore) #A library of core preprocessing routines.
library(plotly) #data analytics and visualization tools.
library(wesanderson) #A Wes Anderson Palette Generator.
library(dplyr) #package which provides a set of tools for efficiently manipulating datasets.
library(ggpubr) #facilitates the creation of beautiful ggplot2-based graphs for researcher with non-advanced programming backgrounds.
library(knitr) #provides a general-purpose tool for dynamic report generation in R.

PROJECT PART I - ANALYSIS OF MICROARRAY EXPERIMENT

Microarrays can be used in many types of experiments. Gene expression profiling is by far the most common use of microarray technology. The two colour microarrays can be used for this type of experiment. The process of analysing gene expression data involves:
1) Feature extraction
2) Quality control
3) Normalisation
4) Differential expression analysis
5) Biological interpretation of the results

3 Affymetrix Human Gene 2.1 ST Array Pipeline

The Human Gene 2.1 ST Array provides the most accurate, sensitive, and comprehensive measurement of protein coding and long intergenic non-coding RNA transcripts.

3.1 Open CEL-files From Newer Affymetrix Arrays (HuGene21ST)

The list.files() command should be used to obtain the list of CEL files in the folder that was specified by the celpath. Then I will import all the CEL files by a single command using the read.celfiles() method.

celpath <- "~/Desktop/oliver/HuGene21ST/"
#import CEL files containing raw probe-level data into an R FeatureSet object
list <- list.files(celpath,full.names=TRUE)
data <- read.celfiles(list)
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437803_HTHuGene21_092712H_SL77_810384-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437804_HTHuGene21_092712H_SL78_810384-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437805_HTHuGene21_111912H_SL313_810392-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437806_HTHuGene21_111912H_SL314_810392-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437807_HTHuGene21_101512H_SL181_810401-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437808_HTHuGene21_101512H_SL182_810401-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437809_HTHuGene21_101512H_SL169_810413-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437810_HTHuGene21_101512H_SL170_810413-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437811_HTHuGene21_100412H_SL139_810416-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437812_HTHuGene21_100412H_SL140_810416-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437813_HTHuGene21_091912H_SL57_810421-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437814_HTHuGene21_091912H_SL58_810421-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437815_HTHuGene21_102512H_SL225_810424-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437816_HTHuGene21_102512H_SL226_810424-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437817_HTHuGene21_102912H_SL261_810430-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437818_HTHuGene21_102912H_SL262_810430-2C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437819_HTHuGene21_101812H_SL211_810432-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437820_HTHuGene21_101812H_SL212_810432-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437821_HTHuGene21_092712H_SL81_810439-1B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437822_HTHuGene21_092712H_SL82_810439-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437823_HTHuGene21_102512H_SL235_810447-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437824_HTHuGene21_102512H_SL236_810447-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437825_HTHuGene21_100212H_SL106_810460-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437826_HTHuGene21_100212H_SL103_810460-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437827_HTHuGene21_102912H_SL253_810462-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437828_HTHuGene21_102912H_SL254_810462-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437829_HTHuGene21_111412H_SL300_810469-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437830_HTHuGene21_111412H_SL297_810469-2_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437831_HTHuGene21_100912H_SL164_810477-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437832_HTHuGene21_100412H_SL125_810494-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437833_HTHuGene21_100412H_SL126_810494-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437834_HTHuGene21_092712H_SL73_810501-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437835_HTHuGene21_092712H_SL74_810501-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437836_HTHuGene21_092712H_SL87_810507-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437837_HTHuGene21_092712H_SL88_810507-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437838_HTHuGene21_100212H_SL99_810516-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437839_HTHuGene21_100212H_SL100_810516-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437840_HTHuGene21_100412H_SL133_810518-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437841_HTHuGene21_100412H_SL134_810518-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437842_HTHuGene21_101812H_SL202_810521-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437843_HTHuGene21_101812H_SL203_810521-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437844_HTHuGene21_102512H_SL227_810529-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437845_HTHuGene21_102512H_SL228_810529-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437846_HTHuGene21_101512H_SL183_810533-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437847_HTHuGene21_101512H_SL184_810533-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437848_HTHuGene21_101512H_SL173_810545-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437849_HTHuGene21_101512H_SL174_810545-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437850_HTHuGene21_082912H_SL19_810563-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437851_HTHuGene21_082912H_SL20_810563-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437852_HTHuGene21_102512H_SL229_810568-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437853_HTHuGene21_102512H_SL230_810568-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437854_HTHuGene21_100212H_SL109_810619-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437855_HTHuGene21_100212H_SL110_810619-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437856_HTHuGene21_091712H_SL39_810657-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437857_HTHuGene21_091712H_SL40_810657-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437858_HTHuGene21_092712H_SL95_812226-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437859_HTHuGene21_092712H_SL96_812226-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437860_HTHuGene21_082912H_SL8_812228-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437861_HTHuGene21_082912H_SL9_812228-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437862_HTHuGene21_102512H_SL217_812230-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437863_HTHuGene21_102512H_SL218_812230-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437864_HTHuGene21_101512H_SL191_812232-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437865_HTHuGene21_101512H_SL192_812232-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437866_HTHuGene21_101812H_SL213_812234-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437867_HTHuGene21_101812H_SL214_812234-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437868_HTHuGene21_100212H_SL111_812235-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437869_HTHuGene21_100212H_SL112_812235-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437870_HTHuGene21_101812H_SL204_812236-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437871_HTHuGene21_101812H_SL199_812236-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437872_HTHuGene21_111412H_SL311_812249-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437873_HTHuGene21_111412H_SL312_812249-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437874_HTHuGene21_111912H_SL335_812261-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437875_HTHuGene21_111912H_SL336_812261-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437876_HTHuGene21_101512H_SL171_812268-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437877_HTHuGene21_101512H_SL172_812268-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437878_HTHuGene21_082912H_SL18_812282-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437879_HTHuGene21_100212H_SL115_812285-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437880_HTHuGene21_100212H_SL116_812285-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437881_HTHuGene21_082912H_SL10_812292-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437882_HTHuGene21_082912H_SL11_812292-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437883_HTHuGene21_111912H_SL323_812296-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437884_HTHuGene21_111912H_SL324_812296-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437885_HTHuGene21_091912H_SL49_812302-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437886_HTHuGene21_091912H_SL50_812302-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437887_HTHuGene21_110512H_SL268_812309-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437888_HTHuGene21_110512H_SL265_812309-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437889_HTHuGene21_100212H_SL97_812324-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437890_HTHuGene21_100212H_SL98_812324-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437891_HTHuGene21_102912H_SL255_812329-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437892_HTHuGene21_102912H_SL256_812329-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437893_HTHuGene21_092712H_SL75_812342-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437894_HTHuGene21_092712H_SL76_812342-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437895_HTHuGene21_110512H_SL269_812344-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437896_HTHuGene21_110512H_SL270_812344-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437897_HTHuGene21_092712H_SL89_812359-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437898_HTHuGene21_092712H_SL90_812359-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437899_HTHuGene21_092712H_SL83_812366-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437900_HTHuGene21_092712H_SL84_812366-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437901_HTHuGene21_110512H_SL277_812387-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437902_HTHuGene21_110512H_SL278_812387-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437903_HTHuGene21_101812H_SL195_812396-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437904_HTHuGene21_101812H_SL196_812396-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437905_HTHuGene21_110512H_SL283_812407-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437906_HTHuGene21_110512H_SL284_812407-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437907_HTHuGene21_100212H_SL104_812448-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437908_HTHuGene21_100212H_SL105_812448-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437909_HTHuGene21_100412H_SL123_812459-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437910_HTHuGene21_100412H_SL124_812459-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437911_HTHuGene21_102512H_SL219_812477-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437912_HTHuGene21_102512H_SL221_812477-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437913_HTHuGene21_101512H_SL177_812509-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437914_HTHuGene21_101512H_SL178_812509-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437915_HTHuGene21_100412H_SL121_812518-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437916_HTHuGene21_100412H_SL122_812518-2C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437917_HTHuGene21_102912H_SL241_812546-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437918_HTHuGene21_102912H_SL242_812546-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437919_HTHuGene21_111912H_SL315_812551-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437920_HTHuGene21_111912H_SL316_812551-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437921_HTHuGene21_100912H_SL149_812555-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437922_HTHuGene21_111412H_SL298_812559-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437923_HTHuGene21_111412H_SL299_812559-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437924_HTHuGene21_091912H_SL69_812562-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437925_HTHuGene21_091912H_SL70_812562-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437926_HTHuGene21_100912H_SL150_812566-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437927_HTHuGene21_100912H_SL151_812566-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437928_HTHuGene21_101512H_SL175_812573-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437929_HTHuGene21_101512H_SL176_812573-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437930_HTHuGene21_111412H_SL289_812574-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437931_HTHuGene21_111412H_SL290_812574-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437932_HTHuGene21_100912H_SL147_812581-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437933_HTHuGene21_100912H_SL148_812581-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437934_HTHuGene21_100412H_SL129_812586-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437935_HTHuGene21_100412H_SL130_812586-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437936_HTHuGene21_100912H_SL158_812587-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437937_HTHuGene21_100912H_SL159_812587-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437938_HTHuGene21_091912H_SL59_812590-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437939_HTHuGene21_091912H_SL60_812590-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437940_HTHuGene21_082912H_SL4_815072-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437941_HTHuGene21_082912H_SL5_815072-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437942_HTHuGene21_082912H_SL16_815073-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437943_HTHuGene21_082912H_SL17_815073-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437944_HTHuGene21_091712H_SL27_815076-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437945_HTHuGene21_091712H_SL28_815076-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437946_HTHuGene21_082912H_SL1_815082-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437947_HTHuGene21_110512H_SL273_815094-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437948_HTHuGene21_110512H_SL274_815094-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437949_HTHuGene21_091912H_SL63_815102-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437950_HTHuGene21_091912H_SL64_815102-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437951_HTHuGene21_091712H_SL33_815116-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437952_HTHuGene21_091712H_SL34_815116-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437953_HTHuGene21_100912H_SL156_815123-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437954_HTHuGene21_100912H_SL157_815123-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437955_HTHuGene21_091712H_SL31_815127-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437956_HTHuGene21_091712H_SL32_815127-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437957_HTHuGene21_082912H_SL2_815137-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437958_HTHuGene21_082912H_SL3_815137-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437959_HTHuGene21_091912H_SL55_815149-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437960_HTHuGene21_091912H_SL56_815149-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437961_HTHuGene21_102912H_SL245_815154-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437962_HTHuGene21_102912H_SL246_815154-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437963_HTHuGene21_082912H_SL21_815163-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437964_HTHuGene21_082912H_SL22_815163-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437965_HTHuGene21_110512H_SL285_815168-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437966_HTHuGene21_110512H_SL286_815168-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437967_HTHuGene21_091912H_SL67_815179-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437968_HTHuGene21_091912H_SL68_815179-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437969_HTHuGene21_110512H_SL266_815183-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437970_HTHuGene21_110512H_SL267_815183-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437971_HTHuGene21_102912H_SL247_815189-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437972_HTHuGene21_102912H_SL248_815189-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437973_HTHuGene21_082912H_SL14_815194-1B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437974_HTHuGene21_082912H_SL15_815194-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437975_HTHuGene21_101512H_SL187_815196-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437976_HTHuGene21_101512H_SL188_815196-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437977_HTHuGene21_092712H_SL79_815200-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437978_HTHuGene21_092712H_SL80_815200-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437979_HTHuGene21_092712H_SL93_815218-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437980_HTHuGene21_092712H_SL94_815218-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437981_HTHuGene21_110512H_SL287_815219-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437982_HTHuGene21_110512H_SL288_815219-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437983_HTHuGene21_100412H_SL137_818022-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437984_HTHuGene21_100412H_SL138_818022-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437985_HTHuGene21_100412H_SL135_818023-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437986_HTHuGene21_100412H_SL136_818023-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437987_HTHuGene21_091912H_SL71_818025-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437988_HTHuGene21_091912H_SL72_818025-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437989_HTHuGene21_091712H_SL35_818032-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437990_HTHuGene21_091712H_SL36_818032-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437991_HTHuGene21_100412H_SL141_818034-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437992_HTHuGene21_100412H_SL142_818034-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437993_HTHuGene21_091712H_SL41_818036-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437994_HTHuGene21_091712H_SL42_818036-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437995_HTHuGene21_111912H_SL319_818046-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437996_HTHuGene21_111912H_SL320_818046-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437997_HTHuGene21_102912H_SL249_818054-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437998_HTHuGene21_102912H_SL250_818054-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1437999_HTHuGene21_110512H_SL279_818070-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438000_HTHuGene21_110512H_SL280_818070-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438001_HTHuGene21_101812H_SL205_818081-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438002_HTHuGene21_101812H_SL206_818081-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438003_HTHuGene21_091912H_SL51_818084-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438004_HTHuGene21_091912H_SL52_818084-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438005_HTHuGene21_092712H_SL85_818088-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438006_HTHuGene21_092712H_SL86_818088-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438007_HTHuGene21_111412H_SL301_818125-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438008_HTHuGene21_111412H_SL302_818125-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438009_HTHuGene21_091712H_SL37_818153-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438010_HTHuGene21_091712H_SL38_818153-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438011_HTHuGene21_091712H_SL45_818156-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438012_HTHuGene21_091712H_SL46_818156-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438013_HTHuGene21_101512H_SL185_818162-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438014_HTHuGene21_101512H_SL186_818162-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438015_HTHuGene21_101812H_SL198_818172-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438016_HTHuGene21_101812H_SL197_818172-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438017_HTHuGene21_100212H_SL107_818174-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438018_HTHuGene21_100212H_SL108_818174-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438019_HTHuGene21_100212H_SL117_818181-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438020_HTHuGene21_100212H_SL118_818181-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438021_HTHuGene21_101512H_SL189_818195-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438022_HTHuGene21_101512H_SL190_818195-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438023_HTHuGene21_111412H_SL305_818200-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438024_HTHuGene21_111412H_SL306_818200-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438025_HTHuGene21_101812H_SL193_818224-1_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438026_HTHuGene21_101812H_SL194_818224-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438027_HTHuGene21_101812H_SL200_818241-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438028_HTHuGene21_101812H_SL201_818241-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438029_HTHuGene21_091912H_SL53_818246-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438030_HTHuGene21_091912H_SL54_818246-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438031_HTHuGene21_101812H_SL207_818249-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438032_HTHuGene21_101812H_SL208_818249-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438033_HTHuGene21_100912H_SL152_818257-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438034_HTHuGene21_100912H_SL153_818257-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438035_HTHuGene21_100912H_SL162_818308-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438036_HTHuGene21_100912H_SL163_818308-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438037_HTHuGene21_110512H_SL275_818357-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438038_HTHuGene21_110512H_SL276_818357-2B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438039_HTHuGene21_111912H_SL325_818361-1i.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438040_HTHuGene21_111912H_SL326_818361-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438041_HTHuGene21_102512H_SL231_818368-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438042_HTHuGene21_102512H_SL232_818368-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438043_HTHuGene21_100912H_SL145_818381-1B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438044_HTHuGene21_100912H_SL146_818381-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438045_HTHuGene21_102912H_SL243_818409-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438046_HTHuGene21_102912H_SL244_818409-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438047_HTHuGene21_110512H_SL271_818481-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438048_HTHuGene21_110512H_SL272_818481-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438049_HTHuGene21_110512H_SL281_818614-1A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438050_HTHuGene21_110512H_SL282_818614-2C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438051_HTHuGene21_111412H_SL291_818615-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438052_HTHuGene21_111412H_SL292_818615-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438053_HTHuGene21_111412H_SL295_818626-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438054_HTHuGene21_111412H_SL296_818626-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438055_HTHuGene21_111412H_SL303_818670-1A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438056_HTHuGene21_111412H_SL304_818670-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438057_HTHuGene21_111412H_SL307_818684-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438058_HTHuGene21_111412H_SL308_818684-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438059_HTHuGene21_111912H_SL317_818781-1C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438060_HTHuGene21_111912H_SL318_818781-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438061_HTHuGene21_111912H_SL321_818827-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438062_HTHuGene21_111912H_SL322_818827-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438063_HTHuGene21_111912H_SL329_830347-1B.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438064_HTHuGene21_111912H_SL330_830347-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438065_HTHuGene21_091912H_SL61_830356-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438066_HTHuGene21_091912H_SL62_830356-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438067_HTHuGene21_111912H_SL331_830370-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438068_HTHuGene21_111912H_SL332_830370-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438069_HTHuGene21_111412H_SL293_830381-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438070_HTHuGene21_111412H_SL294_830381-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438071_HTHuGene21_100212H_SL101_830397-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438072_HTHuGene21_100212H_SL102_830397-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438073_HTHuGene21_100212H_SL113_830398-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438074_HTHuGene21_100212H_SL114_830398-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438075_HTHuGene21_100912H_SL160_830432-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438076_HTHuGene21_100912H_SL161_830432-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438077_HTHuGene21_100412H_SL127_830446-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438078_HTHuGene21_100412H_SL128_830446-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438079_HTHuGene21_082912H_SL23_830478-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438080_HTHuGene21_082912H_SL24_830478-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438081_HTHuGene21_101512H_SL179_830505-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438082_HTHuGene21_101512H_SL180_830505-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438083_HTHuGene21_091712H_SL47_830507-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438084_HTHuGene21_091712H_SL48_830507-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438085_HTHuGene21_102512H_SL237_830515-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438086_HTHuGene21_102512H_SL238_830515-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438087_HTHuGene21_100912H_SL165_830518-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438088_HTHuGene21_100912H_SL166_830518-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438089_HTHuGene21_100412H_SL131_830538-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438090_HTHuGene21_100412H_SL132_830538-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438091_HTHuGene21_100912H_SL154_830544-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438092_HTHuGene21_100912H_SL155_830544-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438093_HTHuGene21_100912H_SL167_830554-1C.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438094_HTHuGene21_100912H_SL168_830554-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438095_HTHuGene21_101812H_SL215_830560-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438096_HTHuGene21_101812H_SL216_830560-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438097_HTHuGene21_102912H_SL263_830561-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438098_HTHuGene21_102912H_SL264_830561-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438099_HTHuGene21_101812H_SL209_830575-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438100_HTHuGene21_101812H_SL210_830575-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438101_HTHuGene21_091712H_SL25_830576-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438102_HTHuGene21_091712H_SL26_830576-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438103_HTHuGene21_092712H_SL91_830584-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438104_HTHuGene21_092712H_SL92_830584-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438105_HTHuGene21_100412H_SL143_830587-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438106_HTHuGene21_100412H_SL144_830587-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438107_HTHuGene21_102512H_SL220_830590-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438108_HTHuGene21_102512H_SL222_830590-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438109_HTHuGene21_082912H_SL6_830597-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438110_HTHuGene21_082912H_SL7_830597-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438111_HTHuGene21_100212H_SL119_830607-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438112_HTHuGene21_100212H_SL120_830607-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438113_HTHuGene21_091712H_SL29_830656-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438114_HTHuGene21_091712H_SL30_830656-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438115_HTHuGene21_102512H_SL233_830692-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438116_HTHuGene21_102512H_SL234_830692-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438117_HTHuGene21_102512H_SL223_830741-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438118_HTHuGene21_102512H_SL224_830741-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438119_HTHuGene21_102912H_SL251_830762-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438120_HTHuGene21_102912H_SL252_830762-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438121_HTHuGene21_111912H_SL333_830790-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438122_HTHuGene21_111912H_SL334_830790-2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438123_HTHuGene21_091712H_SL43_830872-1A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438124_HTHuGene21_091712H_SL44_830872-2A.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438125_HTHuGene21_102912H_SL257_830909-1.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HuGene21ST//GSM1438126_HTHuGene21_102912H_SL258_830909-2.CEL
The data is now a specific FeatureSet object containing the data from my CEL files.

3.2 Data Exploration

3.2.1 Some Initial Statistics

The rows represent the microarray probes while the columns represent one microarray. The expression intensity values are in the assayData sub-object exprs and can be accessed by the exprs() function.


## The number of microarray probes is equal to 1416100 and the number of microarray samples is equal to 326
## The type of the raw data is an GeneFeatureSet

3.2.2 Retrieving Information About The Raw Data

How to retrieve intensities of specific rows in the CEL files? There are two methods exprs() and intensity() that can obtain intensity data. Both methods return the same result: a matrix with intensities of all probes.

expr <- oligo::exprs(data)
expr[1:10,1:10]
##    GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
## 1                                             118.0
## 2                                            5303.4
## 3                                             133.0
## 4                                            5557.9
## 5                                             102.0
## 6                                             117.0
## 7                                             124.0
## 8                                             206.0
## 9                                              97.0
## 10                                             72.0
##    GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL
## 1                                             130.0
## 2                                            5664.5
## 3                                             144.0
## 4                                            5532.6
## 5                                              91.0
## 6                                             168.0
## 7                                             115.0
## 8                                             146.0
## 9                                              91.0
## 10                                             78.0
##    GSM1437803_HTHuGene21_092712H_SL77_810384-1.CEL
## 1                                            104.0
## 2                                           4277.1
## 3                                            106.0
## 4                                           4889.8
## 5                                             77.0
## 6                                            101.0
## 7                                             80.0
## 8                                            209.0
## 9                                             53.0
## 10                                            54.0
##    GSM1437804_HTHuGene21_092712H_SL78_810384-2.CEL
## 1                                            121.0
## 2                                           5610.6
## 3                                            145.0
## 4                                           6418.1
## 5                                             88.0
## 6                                            183.0
## 7                                             64.0
## 8                                            172.0
## 9                                             82.0
## 10                                            59.0
##    GSM1437805_HTHuGene21_111912H_SL313_810392-1.CEL
## 1                                             161.0
## 2                                            4597.7
## 3                                             182.0
## 4                                            5007.0
## 5                                             100.0
## 6                                             173.0
## 7                                             121.0
## 8                                             198.0
## 9                                              86.0
## 10                                             84.0
##    GSM1437806_HTHuGene21_111912H_SL314_810392-2A.CEL
## 1                                              130.0
## 2                                             4706.3
## 3                                              133.0
## 4                                             5155.9
## 5                                              105.0
## 6                                              145.0
## 7                                               83.0
## 8                                              133.0
## 9                                               58.0
## 10                                              59.0
##    GSM1437807_HTHuGene21_101512H_SL181_810401-1.CEL
## 1                                             135.0
## 2                                            5351.1
## 3                                             117.0
## 4                                            5758.2
## 5                                              96.0
## 6                                             147.0
## 7                                              91.0
## 8                                             142.0
## 9                                              70.0
## 10                                             79.0
##    GSM1437808_HTHuGene21_101512H_SL182_810401-2.CEL
## 1                                             152.0
## 2                                            6002.2
## 3                                             158.0
## 4                                            6022.5
## 5                                              99.0
## 6                                             113.0
## 7                                              78.0
## 8                                              82.0
## 9                                              69.0
## 10                                             91.0
##    GSM1437809_HTHuGene21_101512H_SL169_810413-1.CEL
## 1                                             298.0
## 2                                            5577.3
## 3                                             292.0
## 4                                            5974.0
## 5                                             202.0
## 6                                             168.0
## 7                                             140.0
## 8                                             257.0
## 9                                             106.0
## 10                                            124.0
##    GSM1437810_HTHuGene21_101512H_SL170_810413-2.CEL
## 1                                             139.0
## 2                                            4587.3
## 3                                             142.0
## 4                                            4813.3
## 5                                              95.0
## 6                                             112.0
## 7                                              73.0
## 8                                             117.0
## 9                                              66.0
## 10                                             69.0

How to retrieve intensities of PM probes of specific rows in the CEL files? I use the pm() function.

pm <- oligo::pm(data)
pm[1:10,1:10]
##    GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
## 6                                               117
## 7                                               124
## 9                                                97
## 10                                               72
## 12                                              108
## 15                                               50
## 23                                              140
## 25                                               44
## 26                                               43
## 27                                               41
##    GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL
## 6                                               168
## 7                                               115
## 9                                                91
## 10                                               78
## 12                                              119
## 15                                               62
## 23                                              153
## 25                                               83
## 26                                               49
## 27                                               70
##    GSM1437803_HTHuGene21_092712H_SL77_810384-1.CEL
## 6                                              101
## 7                                               80
## 9                                               53
## 10                                              54
## 12                                             121
## 15                                              51
## 23                                              74
## 25                                              36
## 26                                              41
## 27                                              40
##    GSM1437804_HTHuGene21_092712H_SL78_810384-2.CEL
## 6                                              183
## 7                                               64
## 9                                               82
## 10                                              59
## 12                                             134
## 15                                              48
## 23                                              52
## 25                                              42
## 26                                              45
## 27                                              41
##    GSM1437805_HTHuGene21_111912H_SL313_810392-1.CEL
## 6                                               173
## 7                                               121
## 9                                                86
## 10                                               84
## 12                                              137
## 15                                               59
## 23                                               58
## 25                                               40
## 26                                               43
## 27                                               38
##    GSM1437806_HTHuGene21_111912H_SL314_810392-2A.CEL
## 6                                                145
## 7                                                 83
## 9                                                 58
## 10                                                59
## 12                                                71
## 15                                                55
## 23                                                43
## 25                                                45
## 26                                                42
## 27                                                43
##    GSM1437807_HTHuGene21_101512H_SL181_810401-1.CEL
## 6                                               147
## 7                                                91
## 9                                                70
## 10                                               79
## 12                                              100
## 15                                               69
## 23                                               46
## 25                                               69
## 26                                               42
## 27                                               45
##    GSM1437808_HTHuGene21_101512H_SL182_810401-2.CEL
## 6                                               113
## 7                                                78
## 9                                                69
## 10                                               91
## 12                                               79
## 15                                               64
## 23                                               78
## 25                                               74
## 26                                               57
## 27                                               56
##    GSM1437809_HTHuGene21_101512H_SL169_810413-1.CEL
## 6                                               168
## 7                                               140
## 9                                               106
## 10                                              124
## 12                                              145
## 15                                               99
## 23                                               84
## 25                                               84
## 26                                               79
## 27                                               92
##    GSM1437810_HTHuGene21_101512H_SL170_810413-2.CEL
## 6                                               112
## 7                                                73
## 9                                                66
## 10                                               69
## 12                                              105
## 15                                               56
## 23                                               52
## 25                                               59
## 26                                               61
## 27                                               51

Apart from the expression data itself, microarray data need to include information about the samples that were hybridized to the arrays. One of them is called phenoData. It contains labels for the samples. However, for most data sets the phenoData has not been defined. How to retrieve the sample annotation of the data?

ph <- data@phenoData; ph
## An object of class 'AnnotatedDataFrame'
##   rowNames: GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
##     GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL ...
##     GSM1438126_HTHuGene21_102912H_SL258_830909-2.CEL (326 total)
##   varLabels: index
##   varMetadata: labelDescription channel

How to retrieve the probe annotation of the data?

feat <- data@featureData
feat@data
## data frame with 0 columns and 1416100 rows
But as I see, the featureData has not been defined. I’ll also retrieve the number of probes represented on the arrays.
length(probeNames(data))
## [1] 1025088

3.2.3 Checking For Missing Values

NA_values <- which(is.na(Biobase::exprs(data)), arr.ind=T)

NaN_values <- which(apply(Biobase::exprs(data), 2, function(x) all(is.nan(x))))

infinite_values <- which(apply(Biobase::exprs(data), 2, function(x) all(is.infinite(x))))

blank_values <- function (x) {sum(x=="") }
bvalues <- apply(Biobase::exprs(data), 2,blank_values); bvalues<-as.character(bvalues);count<-0
for(index in 1:length(bvalues)){
  if(bvalues[index]!=0){
    count=count+1 } }
This table summarizes the number of missing values in this GeneFeatureSet.
Count
NA values 0
NaN values 0
Infinite values 0
Blank values 0

3.3 Quality Control

Since the phenoData object does not contain any information, Bioconductor will just give the CEL-files an index 1-326. However, the phenoData will be used as labels in plots. I am going to give the samples more accurate names so they can be used in the plots that I am going to create.


ph@data[ ,1] <- c("control1","control2","control3","control4","control5","control6","control7","control8","sPTD1","sPTD2","control9","control10","control11","control12","control13","control14","control15","control16","control17","control18","control19","control20","control21","control22","control23","control24","control25","control26","control27","control28","sPTD3","control29","control30","control31","control32","control33","control34","control35","control36","control37","control38","control39","control40","PPROM1","PPROM2","control41","control42","control43","control44","control45","control46","control47","control48","control49","control50","sPTD4","sPTD5","control51","control52","sPTD6","sPTD7","control53","control54","control55","control56","control57","control58","control59","control60","control61","control62","control63","control64","control65","control66","control67","control68","PPROM3","sPTD8","sPTD9","control69","control70","control71","control72","PPROM4","PPROM5","sPTD10","sPTD11","control73","control74","control75","control76","PPROM6","PPROM7","control77","control78","PPROM8","PPROM9","control79","control80","control81","control82","control83","control84","control85","control86","PPROM10","PPROM11","PPROM12","PPROM13","control87","control88","control89","control90","control91","control92","control93","control94","control95","control96","PPROM14","control97","control98","control99","control100","PPROM15","PPROM16","control101","control102","control103","control104","control105","control106","control107","control108","PPROM17","PPROM18","control109","control110","control111","control112","sPTD12","sPTD13","PPROM19","PPROM20","PPROM21","control113","control114","control115","control116","PPROM22","PPROM23","control117","control118","control119","control120","control121","control122","PPROM24","PPROM25","control123","control124","control125","control126","control127","control128","PPROM26","PPROM27","control129","control130","control131","control132","control133","control134","control135","control136","sPTD14","sPTD15","sPTD16","sPTD17","control137","control138","control139","control140","PPROM28","PPROM29","control141","control142","control143","control144","PPROM30","PPROM31","control145","control146","control147","control148","control149","control150","control151","control152","control153","control154","control155","control156","control157","control158","control159","control160","control161","control162","control163","control164","PPROM32","PPROM33","control165","control166","control167","control168","control169","control170","PPROM34","PPROM35","control171","control172","PPROM36","PPROM37","PPROM38","PPROM39","control173","control174","PPROM40","PPROM41","control175","control176","control177","control178","control179","control180","control181","control182","PPROM42","PPROM43","control183","control184","PPROM44","PPROM45","PPROM46","PPROM47","PPROM48","PPROM49","sPTD18","sPTD19","PPROM50","PPROM51","sPTD20","sPTD21","PPROM52","PPROM53","PPROM54","PPROM55","PPROM56","PPROM57","control185","control186","control187","control188","control189","control190","control191","control192","sPTD22","sPTD23","PPROM58","PPROM59","control193","control194","sPTD24","sPTD25","control195","control196","PPROM60","PPROM61","control197","control198","control199","control200","control201","control202","control203","control204","control205","control206","control207","control208","sPTD26","sPTD27","control209","control210","control211","control212","control213","control214","control215","control216","control217","control218","sPTD28","sPTD29","control219","control220","control221","control222","control223","control224","control225","control226","control227","control228","PPROM62","PPROM63","PPROM64","PPROM65","PPROM66","PPROM67","PPROM68","PPROM69"); ph
## An object of class 'AnnotatedDataFrame'
##   rowNames: GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL
##     GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL ...
##     GSM1438126_HTHuGene21_102912H_SL258_830909-2.CEL (326 total)
##   varLabels: index
##   varMetadata: labelDescription channel

3.3.1 Multivarious Plots Creation

It’s time to create some plot to assess the quality of the data.

MA plots are developed for two-color arrays to detect differences between the two color labels on the same array. The MA plot shows to what extent the variability in expression depends on the expression level.
In an MA-plot, A is plotted versus M:
M is the difference between the intensity of a probe on the array and the median intensity of that probe over all arrays; Formula: M = logPMInt_array - logPMInt_medianarray
A is the average of the intensity of a probe on that array and the median intesity of that probe over all arrays; Formula: A = (logPMInt_array + logPMInt_medianarray)/2

I’m going to draw MA plots for the first few microarrays, because plotting above ten arrays is computationally expensive. The which argument allows me to specify which array to compare with the median array. Note that I didn’t use the par() method because for better and proper visualization/clarity of these MA plots.

for(i in 1:3){
  MAplot(data,which=i)
}

Ideally, the cloud of data points should be centered around M=0 (blue line). Additionally, the variability of the M values should be similar for different A values (average intensities). I also see that the spread of the cloud increases with the average intensity: the loess curve (red line) moves further and further away from M=0 when A increases. To remove (some of) this dependency, I should normalize the data.

I’ll then check for distribution of signal value across the samples.
oligo::boxplot(data, target = "core", main = "Boxplot of log2-intensitites for the raw data", las=2,names=c("control1","control2","control3","control4","control5","control6","control7","control8","sPTD1","sPTD2","control9","control10","control11","control12","control13","control14","control15","control16","control17","control18","control19","control20","control21","control22","control23","control24","control25","control26","control27","control28","sPTD3","control29","control30","control31","control32","control33","control34","control35","control36","control37","control38","control39","control40","PPROM1","PPROM2","control41","control42","control43","control44","control45","control46","control47","control48","control49","control50","sPTD4","sPTD5","control51","control52","sPTD6","sPTD7","control53","control54","control55","control56","control57","control58","control59","control60","control61","control62","control63","control64","control65","control66","control67","control68","PPROM3","sPTD8","sPTD9","control69","control70","control71","control72","PPROM4","PPROM5","sPTD10","sPTD11","control73","control74","control75","control76","PPROM6","PPROM7","control77","control78","PPROM8","PPROM9","control79","control80","control81","control82","control83","control84","control85","control86","PPROM10","PPROM11","PPROM12","PPROM13","control87","control88","control89","control90","control91","control92","control93","control94","control95","control96","PPROM14","control97","control98","control99","control100","PPROM15","PPROM16","control101","control102","control103","control104","control105","control106","control107","control108","PPROM17","PPROM18","control109","control110","control111","control112","sPTD12","sPTD13","PPROM19","PPROM20","PPROM21","control113","control114","control115","control116","PPROM22","PPROM23","control117","control118","control119","control120","control121","control122","PPROM24","PPROM25","control123","control124","control125","control126","control127","control128","PPROM26","PPROM27","control129","control130","control131","control132","control133","control134","control135","control136","sPTD14","sPTD15","sPTD16","sPTD17","control137","control138","control139","control140","PPROM28","PPROM29","control141","control142","control143","control144","PPROM30","PPROM31","control145","control146","control147","control148","control149","control150","control151","control152","control153","control154","control155","control156","control157","control158","control159","control160","control161","control162","control163","control164","PPROM32","PPROM33","control165","control166","control167","control168","control169","control170","PPROM34","PPROM35","control171","control172","PPROM36","PPROM37","PPROM38","PPROM39","control173","control174","PPROM40","PPROM41","control175","control176","control177","control178","control179","control180","control181","control182","PPROM42","PPROM43","control183","control184","PPROM44","PPROM45","PPROM46","PPROM47","PPROM48","PPROM49","sPTD18","sPTD19","PPROM50","PPROM51","sPTD20","sPTD21","PPROM52","PPROM53","PPROM54","PPROM55","PPROM56","PPROM57","control185","control186","control187","control188","control189","control190","control191","control192","sPTD22","sPTD23","PPROM58","PPROM59","control193","control194","sPTD24","sPTD25","control195","control196","PPROM60","PPROM61","control197","control198","control199","control200","control201","control202","control203","control204","control205","control206","control207","control208","sPTD26","sPTD27","control209","control210","control211","control212","control213","control214","control215","control216","control217","control218","sPTD28","sPTD29","control219","control220","control221","control222","control223","control224","control225","control226","control227","control228","PPROM62","PPROM63","PPROM64","PPROM65","PPROM66","PPROM67","PPROM68","PPROM69"),col=c("red","red","red","red","red","red","red","red","green","green","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","green","red","red","red","red","red","red","red","red","red","red","red","red","blue","blue","red","red","red","red","red","red","red","red","red","red","green","green","red","red","green","green","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","blue","green","green","red","red","red","red","blue","blue","green","green","red","red","red","red","blue","blue","red","red","blue","blue","red","red","red","red","red","red","red","red","blue","blue","blue","blue","red","red","red","red","red","red","red","red","red","red","blue","red","red","red","red","blue","blue","red","red","red","red","red","red","red","red","blue","blue","red","red","red","red","green","green","blue","blue","blue","red","red","red","red","blue","blue","red","red","red","red","red","red","blue","blue","red","red","red","red","red","red","blue","blue","red","red","red","red","red","red","red","red","green","green","green","green","red","red","red","red","blue","blue","red","red","red","red","blue","blue","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","blue","blue","red","red","red","red","red","red","blue","blue","red","red","blue","blue","blue","blue","red","red","blue","blue","red","red","red","red","red","red","red","red","blue","blue","red","red","blue","blue","blue","blue","blue","blue","green","green","blue","blue","green","green","blue","blue","blue","blue","blue","blue","red","red","red","red","red","red","red","red","green","green","blue","blue","red","red","green","green","red","red","blue","blue","red","red","red","red","red","red","red","red","red","red","red","red","green","green","red","red","red","red","red","red","red","red","red","red","green","green","red","red","red","red","red","red","red","red","red","red","blue","blue","blue","blue","blue","blue","blue","blue"))

3.4 Data Normalization

There are many sources of noise in microarray experiments:
1) Different amounts of RNA used for labeling and hybridization
2) Imperfections on the array surface
3) Imperfect synthesis of the probes
4) Differences in hybridization conditions
Systematic differences between the samples that are due to noise rather than true biological variability should be removed in order to make biologically meaningful conclusions about the data.


3.4.1 Robust Multi-array Average

The standard method for normalization is RMA. This latter is one of the few normalization methods that only uses the PM probes. But how to normalize the data using RMA? The rma() method produces a data matrix for Affymetrix arrays. The input for rma() function is an FeatureSet object while its output is an ExpressionSet object with the data matrix containing the normalized log-intensities in the exprs slot.

data.rma <- oligo::rma(data)
## Background correcting
## Normalizing
## Calculating Expression
data.matrix <- Biobase::exprs(data.rma)
Normalization can also be done using the GCRMA algorithm. GCRMA is based on RMA, having all the good sides of RMA. The difference lies in the background correction, all other steps are the same. GCRMA corrects for non-specific binding to the probes in contrast to RMA which completely ignores the issue of non-specific binding.

3.4.2 Checking The Effect Of The Normalization

I’ll re-visualize the first few MA plots.

When I compare this plot to the one created for the raw intensities, I see a much more symmetric and even spread of the data indicating that the dependence of the variability on the average expression level is not as strong as it was before normalization.

Not only MA plots, but boxplots will show us the comparison between the raw and the normalized data. I will show the first few arrays for better clarity and proper visualization of these boxplots.

Without using ggplot:

Using ggplot: How to create a box plot of normalized intensities?

After normalization, none of the samples should stand out from the rest. The different arrays should have the same (or at least a very comparable) median expression level. Also the scale of the boxes should be almost the same indicating that also the spread of the intensity values on the different arrays is comparable.

3.5 Differential Gene Expression

The identification of DE genes is not done by the affy nor the oligo package but by the limma package, which uses the output of the rma() method called data.rma as input.

3.5.1 Three groups of samples

I will now compare sPTD and PPROM to a set of Control women.

Firstly, I need to tell limma which samples are replicates and which samples belong to different groups. To this end, I will add a second column with sample annotation describing the source of each sample & I will give this new column a name.

ph@data[ ,2] <- c("control","control","control","control","control","control","control","control","sPTD","sPTD","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","sPTD","control","control","control","control","control","control","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","control","control","sPTD","sPTD","control","control","sPTD","sPTD","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","PPROM","sPTD","sPTD","control","control","control","control","PPROM","PPROM","sPTD","sPTD","control","control","control","control","PPROM","PPROM","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","PPROM","PPROM","PPROM","PPROM","control","control","control","control","control","control","control","control","control","control","PPROM","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","sPTD","sPTD","PPROM","PPROM","PPROM","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","sPTD","sPTD","sPTD","sPTD","control","control","control","control","PPROM","PPROM","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","control","PPROM","PPROM","control","control","control","control","control","control","PPROM","PPROM","control","control","PPROM","PPROM","PPROM","PPROM","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","PPROM","PPROM","control","control","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","sPTD","sPTD","PPROM","PPROM","sPTD","sPTD","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","control","control","control","control","control","control","control","control","sPTD","sPTD","PPROM","PPROM","control","control","sPTD","sPTD","control","control","PPROM","PPROM","control","control","control","control","control","control","control","control","control","control","control","control","sPTD","sPTD","control","control","control","control","control","control","control","control","control","control","sPTD","sPTD","control","control","control","control","control","control","control","control","control","control","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM")
colnames(ph@data)[2] <- "level"; ph@data
##                                                         index   level
## GSM1437801_HTHuGene21_111412H_SL309_515122-1.CEL     control1 control
## GSM1437802_HTHuGene21_111412H_SL310_515122-2.CEL     control2 control
## GSM1437803_HTHuGene21_092712H_SL77_810384-1.CEL      control3 control
## GSM1437804_HTHuGene21_092712H_SL78_810384-2.CEL      control4 control
## GSM1437805_HTHuGene21_111912H_SL313_810392-1.CEL     control5 control
## GSM1437806_HTHuGene21_111912H_SL314_810392-2A.CEL    control6 control
## GSM1437807_HTHuGene21_101512H_SL181_810401-1.CEL     control7 control
## GSM1437808_HTHuGene21_101512H_SL182_810401-2.CEL     control8 control
## GSM1437809_HTHuGene21_101512H_SL169_810413-1.CEL        sPTD1    sPTD
## GSM1437810_HTHuGene21_101512H_SL170_810413-2.CEL        sPTD2    sPTD
## GSM1437811_HTHuGene21_100412H_SL139_810416-1.CEL     control9 control
## GSM1437812_HTHuGene21_100412H_SL140_810416-2.CEL    control10 control
## GSM1437813_HTHuGene21_091912H_SL57_810421-1.CEL     control11 control
## GSM1437814_HTHuGene21_091912H_SL58_810421-2.CEL     control12 control
## GSM1437815_HTHuGene21_102512H_SL225_810424-1_2.CEL  control13 control
## GSM1437816_HTHuGene21_102512H_SL226_810424-2.CEL    control14 control
## GSM1437817_HTHuGene21_102912H_SL261_810430-1.CEL    control15 control
## GSM1437818_HTHuGene21_102912H_SL262_810430-2C.CEL   control16 control
## GSM1437819_HTHuGene21_101812H_SL211_810432-1.CEL    control17 control
## GSM1437820_HTHuGene21_101812H_SL212_810432-2.CEL    control18 control
## GSM1437821_HTHuGene21_092712H_SL81_810439-1B.CEL    control19 control
## GSM1437822_HTHuGene21_092712H_SL82_810439-2A.CEL    control20 control
## GSM1437823_HTHuGene21_102512H_SL235_810447-1.CEL    control21 control
## GSM1437824_HTHuGene21_102512H_SL236_810447-2.CEL    control22 control
## GSM1437825_HTHuGene21_100212H_SL106_810460-1.CEL    control23 control
## GSM1437826_HTHuGene21_100212H_SL103_810460-2.CEL    control24 control
## GSM1437827_HTHuGene21_102912H_SL253_810462-1.CEL    control25 control
## GSM1437828_HTHuGene21_102912H_SL254_810462-2.CEL    control26 control
## GSM1437829_HTHuGene21_111412H_SL300_810469-1.CEL    control27 control
## GSM1437830_HTHuGene21_111412H_SL297_810469-2_2.CEL  control28 control
## GSM1437831_HTHuGene21_100912H_SL164_810477-1.CEL        sPTD3    sPTD
## GSM1437832_HTHuGene21_100412H_SL125_810494-1.CEL    control29 control
## GSM1437833_HTHuGene21_100412H_SL126_810494-2.CEL    control30 control
## GSM1437834_HTHuGene21_092712H_SL73_810501-1.CEL     control31 control
## GSM1437835_HTHuGene21_092712H_SL74_810501-2.CEL     control32 control
## GSM1437836_HTHuGene21_092712H_SL87_810507-1.CEL     control33 control
## GSM1437837_HTHuGene21_092712H_SL88_810507-2.CEL     control34 control
## GSM1437838_HTHuGene21_100212H_SL99_810516-1.CEL     control35 control
## GSM1437839_HTHuGene21_100212H_SL100_810516-2.CEL    control36 control
## GSM1437840_HTHuGene21_100412H_SL133_810518-1.CEL    control37 control
## GSM1437841_HTHuGene21_100412H_SL134_810518-2.CEL    control38 control
## GSM1437842_HTHuGene21_101812H_SL202_810521-1.CEL    control39 control
## GSM1437843_HTHuGene21_101812H_SL203_810521-2.CEL    control40 control
## GSM1437844_HTHuGene21_102512H_SL227_810529-1.CEL       PPROM1   PPROM
## GSM1437845_HTHuGene21_102512H_SL228_810529-2.CEL       PPROM2   PPROM
## GSM1437846_HTHuGene21_101512H_SL183_810533-1.CEL    control41 control
## GSM1437847_HTHuGene21_101512H_SL184_810533-2.CEL    control42 control
## GSM1437848_HTHuGene21_101512H_SL173_810545-1.CEL    control43 control
## GSM1437849_HTHuGene21_101512H_SL174_810545-2.CEL    control44 control
## GSM1437850_HTHuGene21_082912H_SL19_810563-1.CEL     control45 control
## GSM1437851_HTHuGene21_082912H_SL20_810563-2.CEL     control46 control
## GSM1437852_HTHuGene21_102512H_SL229_810568-1.CEL    control47 control
## GSM1437853_HTHuGene21_102512H_SL230_810568-2.CEL    control48 control
## GSM1437854_HTHuGene21_100212H_SL109_810619-1.CEL    control49 control
## GSM1437855_HTHuGene21_100212H_SL110_810619-2.CEL    control50 control
## GSM1437856_HTHuGene21_091712H_SL39_810657-1.CEL         sPTD4    sPTD
## GSM1437857_HTHuGene21_091712H_SL40_810657-2.CEL         sPTD5    sPTD
## GSM1437858_HTHuGene21_092712H_SL95_812226-1.CEL     control51 control
## GSM1437859_HTHuGene21_092712H_SL96_812226-2.CEL     control52 control
## GSM1437860_HTHuGene21_082912H_SL8_812228-1.CEL          sPTD6    sPTD
## GSM1437861_HTHuGene21_082912H_SL9_812228-2.CEL          sPTD7    sPTD
## GSM1437862_HTHuGene21_102512H_SL217_812230-1_2.CEL  control53 control
## GSM1437863_HTHuGene21_102512H_SL218_812230-2.CEL    control54 control
## GSM1437864_HTHuGene21_101512H_SL191_812232-1.CEL    control55 control
## GSM1437865_HTHuGene21_101512H_SL192_812232-2.CEL    control56 control
## GSM1437866_HTHuGene21_101812H_SL213_812234-1.CEL    control57 control
## GSM1437867_HTHuGene21_101812H_SL214_812234-2.CEL    control58 control
## GSM1437868_HTHuGene21_100212H_SL111_812235-1.CEL    control59 control
## GSM1437869_HTHuGene21_100212H_SL112_812235-2.CEL    control60 control
## GSM1437870_HTHuGene21_101812H_SL204_812236-1.CEL    control61 control
## GSM1437871_HTHuGene21_101812H_SL199_812236-2.CEL    control62 control
## GSM1437872_HTHuGene21_111412H_SL311_812249-1.CEL    control63 control
## GSM1437873_HTHuGene21_111412H_SL312_812249-2.CEL    control64 control
## GSM1437874_HTHuGene21_111912H_SL335_812261-1.CEL    control65 control
## GSM1437875_HTHuGene21_111912H_SL336_812261-2.CEL    control66 control
## GSM1437876_HTHuGene21_101512H_SL171_812268-1.CEL    control67 control
## GSM1437877_HTHuGene21_101512H_SL172_812268-2.CEL    control68 control
## GSM1437878_HTHuGene21_082912H_SL18_812282-1.CEL        PPROM3   PPROM
## GSM1437879_HTHuGene21_100212H_SL115_812285-1.CEL        sPTD8    sPTD
## GSM1437880_HTHuGene21_100212H_SL116_812285-2.CEL        sPTD9    sPTD
## GSM1437881_HTHuGene21_082912H_SL10_812292-1.CEL     control69 control
## GSM1437882_HTHuGene21_082912H_SL11_812292-2.CEL     control70 control
## GSM1437883_HTHuGene21_111912H_SL323_812296-1.CEL    control71 control
## GSM1437884_HTHuGene21_111912H_SL324_812296-2.CEL    control72 control
## GSM1437885_HTHuGene21_091912H_SL49_812302-1.CEL        PPROM4   PPROM
## GSM1437886_HTHuGene21_091912H_SL50_812302-2.CEL        PPROM5   PPROM
## GSM1437887_HTHuGene21_110512H_SL268_812309-1.CEL       sPTD10    sPTD
## GSM1437888_HTHuGene21_110512H_SL265_812309-2.CEL       sPTD11    sPTD
## GSM1437889_HTHuGene21_100212H_SL97_812324-1.CEL     control73 control
## GSM1437890_HTHuGene21_100212H_SL98_812324-2.CEL     control74 control
## GSM1437891_HTHuGene21_102912H_SL255_812329-1.CEL    control75 control
## GSM1437892_HTHuGene21_102912H_SL256_812329-2.CEL    control76 control
## GSM1437893_HTHuGene21_092712H_SL75_812342-1.CEL        PPROM6   PPROM
## GSM1437894_HTHuGene21_092712H_SL76_812342-2.CEL        PPROM7   PPROM
## GSM1437895_HTHuGene21_110512H_SL269_812344-1.CEL    control77 control
## GSM1437896_HTHuGene21_110512H_SL270_812344-2.CEL    control78 control
## GSM1437897_HTHuGene21_092712H_SL89_812359-1.CEL        PPROM8   PPROM
## GSM1437898_HTHuGene21_092712H_SL90_812359-2.CEL        PPROM9   PPROM
## GSM1437899_HTHuGene21_092712H_SL83_812366-1.CEL     control79 control
## GSM1437900_HTHuGene21_092712H_SL84_812366-2.CEL     control80 control
## GSM1437901_HTHuGene21_110512H_SL277_812387-1.CEL    control81 control
## GSM1437902_HTHuGene21_110512H_SL278_812387-2.CEL    control82 control
## GSM1437903_HTHuGene21_101812H_SL195_812396-1.CEL    control83 control
## GSM1437904_HTHuGene21_101812H_SL196_812396-2.CEL    control84 control
## GSM1437905_HTHuGene21_110512H_SL283_812407-1.CEL    control85 control
## GSM1437906_HTHuGene21_110512H_SL284_812407-2.CEL    control86 control
## GSM1437907_HTHuGene21_100212H_SL104_812448-1.CEL      PPROM10   PPROM
## GSM1437908_HTHuGene21_100212H_SL105_812448-2.CEL      PPROM11   PPROM
## GSM1437909_HTHuGene21_100412H_SL123_812459-1.CEL      PPROM12   PPROM
## GSM1437910_HTHuGene21_100412H_SL124_812459-2.CEL      PPROM13   PPROM
## GSM1437911_HTHuGene21_102512H_SL219_812477-1.CEL    control87 control
## GSM1437912_HTHuGene21_102512H_SL221_812477-2.CEL    control88 control
## GSM1437913_HTHuGene21_101512H_SL177_812509-1.CEL    control89 control
## GSM1437914_HTHuGene21_101512H_SL178_812509-2.CEL    control90 control
## GSM1437915_HTHuGene21_100412H_SL121_812518-1.CEL    control91 control
## GSM1437916_HTHuGene21_100412H_SL122_812518-2C.CEL   control92 control
## GSM1437917_HTHuGene21_102912H_SL241_812546-1_2.CEL  control93 control
## GSM1437918_HTHuGene21_102912H_SL242_812546-2.CEL    control94 control
## GSM1437919_HTHuGene21_111912H_SL315_812551-1.CEL    control95 control
## GSM1437920_HTHuGene21_111912H_SL316_812551-2.CEL    control96 control
## GSM1437921_HTHuGene21_100912H_SL149_812555-1.CEL      PPROM14   PPROM
## GSM1437922_HTHuGene21_111412H_SL298_812559-1.CEL    control97 control
## GSM1437923_HTHuGene21_111412H_SL299_812559-2.CEL    control98 control
## GSM1437924_HTHuGene21_091912H_SL69_812562-1.CEL     control99 control
## GSM1437925_HTHuGene21_091912H_SL70_812562-2.CEL    control100 control
## GSM1437926_HTHuGene21_100912H_SL150_812566-1.CEL      PPROM15   PPROM
## GSM1437927_HTHuGene21_100912H_SL151_812566-2.CEL      PPROM16   PPROM
## GSM1437928_HTHuGene21_101512H_SL175_812573-1.CEL   control101 control
## GSM1437929_HTHuGene21_101512H_SL176_812573-2.CEL   control102 control
## GSM1437930_HTHuGene21_111412H_SL289_812574-1_2.CEL control103 control
## GSM1437931_HTHuGene21_111412H_SL290_812574-2.CEL   control104 control
## GSM1437932_HTHuGene21_100912H_SL147_812581-1.CEL   control105 control
## GSM1437933_HTHuGene21_100912H_SL148_812581-2.CEL   control106 control
## GSM1437934_HTHuGene21_100412H_SL129_812586-1_2.CEL control107 control
## GSM1437935_HTHuGene21_100412H_SL130_812586-2.CEL   control108 control
## GSM1437936_HTHuGene21_100912H_SL158_812587-1.CEL      PPROM17   PPROM
## GSM1437937_HTHuGene21_100912H_SL159_812587-2.CEL      PPROM18   PPROM
## GSM1437938_HTHuGene21_091912H_SL59_812590-1.CEL    control109 control
## GSM1437939_HTHuGene21_091912H_SL60_812590-2.CEL    control110 control
## GSM1437940_HTHuGene21_082912H_SL4_815072-1.CEL     control111 control
## GSM1437941_HTHuGene21_082912H_SL5_815072-2.CEL     control112 control
## GSM1437942_HTHuGene21_082912H_SL16_815073-1.CEL        sPTD12    sPTD
## GSM1437943_HTHuGene21_082912H_SL17_815073-2.CEL        sPTD13    sPTD
## GSM1437944_HTHuGene21_091712H_SL27_815076-1.CEL       PPROM19   PPROM
## GSM1437945_HTHuGene21_091712H_SL28_815076-2.CEL       PPROM20   PPROM
## GSM1437946_HTHuGene21_082912H_SL1_815082-1.CEL        PPROM21   PPROM
## GSM1437947_HTHuGene21_110512H_SL273_815094-1.CEL   control113 control
## GSM1437948_HTHuGene21_110512H_SL274_815094-2.CEL   control114 control
## GSM1437949_HTHuGene21_091912H_SL63_815102-1.CEL    control115 control
## GSM1437950_HTHuGene21_091912H_SL64_815102-2.CEL    control116 control
## GSM1437951_HTHuGene21_091712H_SL33_815116-1.CEL       PPROM22   PPROM
## GSM1437952_HTHuGene21_091712H_SL34_815116-2.CEL       PPROM23   PPROM
## GSM1437953_HTHuGene21_100912H_SL156_815123-1.CEL   control117 control
## GSM1437954_HTHuGene21_100912H_SL157_815123-2.CEL   control118 control
## GSM1437955_HTHuGene21_091712H_SL31_815127-1.CEL    control119 control
## GSM1437956_HTHuGene21_091712H_SL32_815127-2.CEL    control120 control
## GSM1437957_HTHuGene21_082912H_SL2_815137-1.CEL     control121 control
## GSM1437958_HTHuGene21_082912H_SL3_815137-2.CEL     control122 control
## GSM1437959_HTHuGene21_091912H_SL55_815149-1.CEL       PPROM24   PPROM
## GSM1437960_HTHuGene21_091912H_SL56_815149-2.CEL       PPROM25   PPROM
## GSM1437961_HTHuGene21_102912H_SL245_815154-1.CEL   control123 control
## GSM1437962_HTHuGene21_102912H_SL246_815154-2.CEL   control124 control
## GSM1437963_HTHuGene21_082912H_SL21_815163-1.CEL    control125 control
## GSM1437964_HTHuGene21_082912H_SL22_815163-2.CEL    control126 control
## GSM1437965_HTHuGene21_110512H_SL285_815168-1.CEL   control127 control
## GSM1437966_HTHuGene21_110512H_SL286_815168-2.CEL   control128 control
## GSM1437967_HTHuGene21_091912H_SL67_815179-1.CEL       PPROM26   PPROM
## GSM1437968_HTHuGene21_091912H_SL68_815179-2.CEL       PPROM27   PPROM
## GSM1437969_HTHuGene21_110512H_SL266_815183-1.CEL   control129 control
## GSM1437970_HTHuGene21_110512H_SL267_815183-2.CEL   control130 control
## GSM1437971_HTHuGene21_102912H_SL247_815189-1.CEL   control131 control
## GSM1437972_HTHuGene21_102912H_SL248_815189-2.CEL   control132 control
## GSM1437973_HTHuGene21_082912H_SL14_815194-1B.CEL   control133 control
## GSM1437974_HTHuGene21_082912H_SL15_815194-2.CEL    control134 control
## GSM1437975_HTHuGene21_101512H_SL187_815196-1.CEL   control135 control
## GSM1437976_HTHuGene21_101512H_SL188_815196-2.CEL   control136 control
## GSM1437977_HTHuGene21_092712H_SL79_815200-1.CEL        sPTD14    sPTD
## GSM1437978_HTHuGene21_092712H_SL80_815200-2.CEL        sPTD15    sPTD
## GSM1437979_HTHuGene21_092712H_SL93_815218-1.CEL        sPTD16    sPTD
## GSM1437980_HTHuGene21_092712H_SL94_815218-2.CEL        sPTD17    sPTD
## GSM1437981_HTHuGene21_110512H_SL287_815219-1.CEL   control137 control
## GSM1437982_HTHuGene21_110512H_SL288_815219-2.CEL   control138 control
## GSM1437983_HTHuGene21_100412H_SL137_818022-1.CEL   control139 control
## GSM1437984_HTHuGene21_100412H_SL138_818022-2.CEL   control140 control
## GSM1437985_HTHuGene21_100412H_SL135_818023-1.CEL      PPROM28   PPROM
## GSM1437986_HTHuGene21_100412H_SL136_818023-2.CEL      PPROM29   PPROM
## GSM1437987_HTHuGene21_091912H_SL71_818025-1.CEL    control141 control
## GSM1437988_HTHuGene21_091912H_SL72_818025-2.CEL    control142 control
## GSM1437989_HTHuGene21_091712H_SL35_818032-1.CEL    control143 control
## GSM1437990_HTHuGene21_091712H_SL36_818032-2.CEL    control144 control
## GSM1437991_HTHuGene21_100412H_SL141_818034-1.CEL      PPROM30   PPROM
## GSM1437992_HTHuGene21_100412H_SL142_818034-2.CEL      PPROM31   PPROM
## GSM1437993_HTHuGene21_091712H_SL41_818036-1.CEL    control145 control
## GSM1437994_HTHuGene21_091712H_SL42_818036-2.CEL    control146 control
## GSM1437995_HTHuGene21_111912H_SL319_818046-1.CEL   control147 control
## GSM1437996_HTHuGene21_111912H_SL320_818046-2.CEL   control148 control
## GSM1437997_HTHuGene21_102912H_SL249_818054-1_2.CEL control149 control
## GSM1437998_HTHuGene21_102912H_SL250_818054-2.CEL   control150 control
## GSM1437999_HTHuGene21_110512H_SL279_818070-1.CEL   control151 control
## GSM1438000_HTHuGene21_110512H_SL280_818070-2.CEL   control152 control
## GSM1438001_HTHuGene21_101812H_SL205_818081-1.CEL   control153 control
## GSM1438002_HTHuGene21_101812H_SL206_818081-2.CEL   control154 control
## GSM1438003_HTHuGene21_091912H_SL51_818084-1.CEL    control155 control
## GSM1438004_HTHuGene21_091912H_SL52_818084-2.CEL    control156 control
## GSM1438005_HTHuGene21_092712H_SL85_818088-1.CEL    control157 control
## GSM1438006_HTHuGene21_092712H_SL86_818088-2.CEL    control158 control
## GSM1438007_HTHuGene21_111412H_SL301_818125-1.CEL   control159 control
## GSM1438008_HTHuGene21_111412H_SL302_818125-2.CEL   control160 control
## GSM1438009_HTHuGene21_091712H_SL37_818153-1.CEL    control161 control
## GSM1438010_HTHuGene21_091712H_SL38_818153-2A.CEL   control162 control
## GSM1438011_HTHuGene21_091712H_SL45_818156-1.CEL    control163 control
## GSM1438012_HTHuGene21_091712H_SL46_818156-2.CEL    control164 control
## GSM1438013_HTHuGene21_101512H_SL185_818162-1.CEL      PPROM32   PPROM
## GSM1438014_HTHuGene21_101512H_SL186_818162-2.CEL      PPROM33   PPROM
## GSM1438015_HTHuGene21_101812H_SL198_818172-1.CEL   control165 control
## GSM1438016_HTHuGene21_101812H_SL197_818172-2.CEL   control166 control
## GSM1438017_HTHuGene21_100212H_SL107_818174-1.CEL   control167 control
## GSM1438018_HTHuGene21_100212H_SL108_818174-2.CEL   control168 control
## GSM1438019_HTHuGene21_100212H_SL117_818181-1.CEL   control169 control
## GSM1438020_HTHuGene21_100212H_SL118_818181-2.CEL   control170 control
## GSM1438021_HTHuGene21_101512H_SL189_818195-1.CEL      PPROM34   PPROM
## GSM1438022_HTHuGene21_101512H_SL190_818195-2.CEL      PPROM35   PPROM
## GSM1438023_HTHuGene21_111412H_SL305_818200-1.CEL   control171 control
## GSM1438024_HTHuGene21_111412H_SL306_818200-2.CEL   control172 control
## GSM1438025_HTHuGene21_101812H_SL193_818224-1_2.CEL    PPROM36   PPROM
## GSM1438026_HTHuGene21_101812H_SL194_818224-2.CEL      PPROM37   PPROM
## GSM1438027_HTHuGene21_101812H_SL200_818241-1.CEL      PPROM38   PPROM
## GSM1438028_HTHuGene21_101812H_SL201_818241-2.CEL      PPROM39   PPROM
## GSM1438029_HTHuGene21_091912H_SL53_818246-1.CEL    control173 control
## GSM1438030_HTHuGene21_091912H_SL54_818246-2.CEL    control174 control
## GSM1438031_HTHuGene21_101812H_SL207_818249-1.CEL      PPROM40   PPROM
## GSM1438032_HTHuGene21_101812H_SL208_818249-2.CEL      PPROM41   PPROM
## GSM1438033_HTHuGene21_100912H_SL152_818257-1.CEL   control175 control
## GSM1438034_HTHuGene21_100912H_SL153_818257-2.CEL   control176 control
## GSM1438035_HTHuGene21_100912H_SL162_818308-1.CEL   control177 control
## GSM1438036_HTHuGene21_100912H_SL163_818308-2.CEL   control178 control
## GSM1438037_HTHuGene21_110512H_SL275_818357-1.CEL   control179 control
## GSM1438038_HTHuGene21_110512H_SL276_818357-2B.CEL  control180 control
## GSM1438039_HTHuGene21_111912H_SL325_818361-1i.CEL  control181 control
## GSM1438040_HTHuGene21_111912H_SL326_818361-2A.CEL  control182 control
## GSM1438041_HTHuGene21_102512H_SL231_818368-1.CEL      PPROM42   PPROM
## GSM1438042_HTHuGene21_102512H_SL232_818368-2.CEL      PPROM43   PPROM
## GSM1438043_HTHuGene21_100912H_SL145_818381-1B.CEL  control183 control
## GSM1438044_HTHuGene21_100912H_SL146_818381-2A.CEL  control184 control
## GSM1438045_HTHuGene21_102912H_SL243_818409-1.CEL      PPROM44   PPROM
## GSM1438046_HTHuGene21_102912H_SL244_818409-2.CEL      PPROM45   PPROM
## GSM1438047_HTHuGene21_110512H_SL271_818481-1.CEL      PPROM46   PPROM
## GSM1438048_HTHuGene21_110512H_SL272_818481-2.CEL      PPROM47   PPROM
## GSM1438049_HTHuGene21_110512H_SL281_818614-1A.CEL     PPROM48   PPROM
## GSM1438050_HTHuGene21_110512H_SL282_818614-2C.CEL     PPROM49   PPROM
## GSM1438051_HTHuGene21_111412H_SL291_818615-1.CEL       sPTD18    sPTD
## GSM1438052_HTHuGene21_111412H_SL292_818615-2.CEL       sPTD19    sPTD
## GSM1438053_HTHuGene21_111412H_SL295_818626-1.CEL      PPROM50   PPROM
## GSM1438054_HTHuGene21_111412H_SL296_818626-2.CEL      PPROM51   PPROM
## GSM1438055_HTHuGene21_111412H_SL303_818670-1A.CEL      sPTD20    sPTD
## GSM1438056_HTHuGene21_111412H_SL304_818670-2.CEL       sPTD21    sPTD
## GSM1438057_HTHuGene21_111412H_SL307_818684-1.CEL      PPROM52   PPROM
## GSM1438058_HTHuGene21_111412H_SL308_818684-2.CEL      PPROM53   PPROM
## GSM1438059_HTHuGene21_111912H_SL317_818781-1C.CEL     PPROM54   PPROM
## GSM1438060_HTHuGene21_111912H_SL318_818781-2A.CEL     PPROM55   PPROM
## GSM1438061_HTHuGene21_111912H_SL321_818827-1.CEL      PPROM56   PPROM
## GSM1438062_HTHuGene21_111912H_SL322_818827-2.CEL      PPROM57   PPROM
## GSM1438063_HTHuGene21_111912H_SL329_830347-1B.CEL  control185 control
## GSM1438064_HTHuGene21_111912H_SL330_830347-2.CEL   control186 control
## GSM1438065_HTHuGene21_091912H_SL61_830356-1.CEL    control187 control
## GSM1438066_HTHuGene21_091912H_SL62_830356-2.CEL    control188 control
## GSM1438067_HTHuGene21_111912H_SL331_830370-1.CEL   control189 control
## GSM1438068_HTHuGene21_111912H_SL332_830370-2.CEL   control190 control
## GSM1438069_HTHuGene21_111412H_SL293_830381-1.CEL   control191 control
## GSM1438070_HTHuGene21_111412H_SL294_830381-2.CEL   control192 control
## GSM1438071_HTHuGene21_100212H_SL101_830397-1.CEL       sPTD22    sPTD
## GSM1438072_HTHuGene21_100212H_SL102_830397-2.CEL       sPTD23    sPTD
## GSM1438073_HTHuGene21_100212H_SL113_830398-1.CEL      PPROM58   PPROM
## GSM1438074_HTHuGene21_100212H_SL114_830398-2.CEL      PPROM59   PPROM
## GSM1438075_HTHuGene21_100912H_SL160_830432-1.CEL   control193 control
## GSM1438076_HTHuGene21_100912H_SL161_830432-2.CEL   control194 control
## GSM1438077_HTHuGene21_100412H_SL127_830446-1.CEL       sPTD24    sPTD
## GSM1438078_HTHuGene21_100412H_SL128_830446-2.CEL       sPTD25    sPTD
## GSM1438079_HTHuGene21_082912H_SL23_830478-1.CEL    control195 control
## GSM1438080_HTHuGene21_082912H_SL24_830478-2.CEL    control196 control
## GSM1438081_HTHuGene21_101512H_SL179_830505-1.CEL      PPROM60   PPROM
## GSM1438082_HTHuGene21_101512H_SL180_830505-2.CEL      PPROM61   PPROM
## GSM1438083_HTHuGene21_091712H_SL47_830507-1.CEL    control197 control
## GSM1438084_HTHuGene21_091712H_SL48_830507-2.CEL    control198 control
## GSM1438085_HTHuGene21_102512H_SL237_830515-1.CEL   control199 control
## GSM1438086_HTHuGene21_102512H_SL238_830515-2.CEL   control200 control
## GSM1438087_HTHuGene21_100912H_SL165_830518-1.CEL   control201 control
## GSM1438088_HTHuGene21_100912H_SL166_830518-2.CEL   control202 control
## GSM1438089_HTHuGene21_100412H_SL131_830538-1.CEL   control203 control
## GSM1438090_HTHuGene21_100412H_SL132_830538-2.CEL   control204 control
## GSM1438091_HTHuGene21_100912H_SL154_830544-1.CEL   control205 control
## GSM1438092_HTHuGene21_100912H_SL155_830544-2.CEL   control206 control
## GSM1438093_HTHuGene21_100912H_SL167_830554-1C.CEL  control207 control
## GSM1438094_HTHuGene21_100912H_SL168_830554-2A.CEL  control208 control
## GSM1438095_HTHuGene21_101812H_SL215_830560-1.CEL       sPTD26    sPTD
## GSM1438096_HTHuGene21_101812H_SL216_830560-2.CEL       sPTD27    sPTD
## GSM1438097_HTHuGene21_102912H_SL263_830561-1.CEL   control209 control
## GSM1438098_HTHuGene21_102912H_SL264_830561-2.CEL   control210 control
## GSM1438099_HTHuGene21_101812H_SL209_830575-1.CEL   control211 control
## GSM1438100_HTHuGene21_101812H_SL210_830575-2.CEL   control212 control
## GSM1438101_HTHuGene21_091712H_SL25_830576-1.CEL    control213 control
## GSM1438102_HTHuGene21_091712H_SL26_830576-2A.CEL   control214 control
## GSM1438103_HTHuGene21_092712H_SL91_830584-1.CEL    control215 control
## GSM1438104_HTHuGene21_092712H_SL92_830584-2.CEL    control216 control
## GSM1438105_HTHuGene21_100412H_SL143_830587-1.CEL   control217 control
## GSM1438106_HTHuGene21_100412H_SL144_830587-2.CEL   control218 control
## GSM1438107_HTHuGene21_102512H_SL220_830590-1.CEL       sPTD28    sPTD
## GSM1438108_HTHuGene21_102512H_SL222_830590-2.CEL       sPTD29    sPTD
## GSM1438109_HTHuGene21_082912H_SL6_830597-1.CEL     control219 control
## GSM1438110_HTHuGene21_082912H_SL7_830597-2A.CEL    control220 control
## GSM1438111_HTHuGene21_100212H_SL119_830607-1.CEL   control221 control
## GSM1438112_HTHuGene21_100212H_SL120_830607-2.CEL   control222 control
## GSM1438113_HTHuGene21_091712H_SL29_830656-1.CEL    control223 control
## GSM1438114_HTHuGene21_091712H_SL30_830656-2.CEL    control224 control
## GSM1438115_HTHuGene21_102512H_SL233_830692-1.CEL   control225 control
## GSM1438116_HTHuGene21_102512H_SL234_830692-2.CEL   control226 control
## GSM1438117_HTHuGene21_102512H_SL223_830741-1.CEL   control227 control
## GSM1438118_HTHuGene21_102512H_SL224_830741-2.CEL   control228 control
## GSM1438119_HTHuGene21_102912H_SL251_830762-1.CEL      PPROM62   PPROM
## GSM1438120_HTHuGene21_102912H_SL252_830762-2.CEL      PPROM63   PPROM
## GSM1438121_HTHuGene21_111912H_SL333_830790-1.CEL      PPROM64   PPROM
## GSM1438122_HTHuGene21_111912H_SL334_830790-2.CEL      PPROM65   PPROM
## GSM1438123_HTHuGene21_091712H_SL43_830872-1A.CEL      PPROM66   PPROM
## GSM1438124_HTHuGene21_091712H_SL44_830872-2A.CEL      PPROM67   PPROM
## GSM1438125_HTHuGene21_102912H_SL257_830909-1.CEL      PPROM68   PPROM
## GSM1438126_HTHuGene21_102912H_SL258_830909-2.CEL      PPROM69   PPROM

So, the factor that determines the grouping will have 3 levels.

groups <- ph@data$level
f <- factor(groups,levels=c("control","sPTD","PPROM"))

Then, I need to create a design matrix,which is a matrix of values of the grouping variable. ANOVA needs such a matrix to know which samples belong to which group. Since limma performs an ANOVA, it needs such a design matrix. I will create it using the model.matrix() method. The argument of the model.matrix method is a model formula.

design <- model.matrix(~ 0 + f)
colnames(design) <- levels(f)

#Fit linear model for each gene given a series of arrays
#arguments:
#object: A matrix-like data object containing log-ratios or log-expression values for a series of arrays, with rows corresponding to genes and columns to samples. Any type of data object that can be processed by getEAWP is acceptable.
#design: the design matrix of the microarray experiment, with rows corresponding to arrays and columns to coefficients to be estimated. Defaults to the unit vector meaning that the arrays are treated as replicates.
data.fit <- lmFit(data.rma, design)

Afterwards, I need to tell limma which groups I want to compare. For this I define a contrast matrix defining the contrasts of interest by using the makeContrasts() method.

#makeContrasts() -> Construct the contrast matrix corresponding to specified contrasts of a set of parameters.
cont.matrix <- makeContrasts(a=sPTD-control,b=PPROM-control,c=sPTD-PPROM,levels=design)

#contrasts.fit() -> Given a linear model fit to microarray data, compute estimated coefficients and standard errors for a given set of contrasts.
data.contr <- contrasts.fit(data.fit,cont.matrix)

#eBayes() -> Given a microarray linear model fit, compute moderated t-statistics, moderated F-statistic, and log-odds of differential expression by empirical Bayes moderation of the standard errors towards a common value.
data.fit.eb <- eBayes(data.contr)
data.fit.eb
## An object of class "MArrayLM"
## $coefficients
##           Contrasts
##                       a            b            c
##   16650001 -0.006471273 -0.079911687  0.073440415
##   16650003  0.063567350  0.065018301 -0.001450951
##   16650005  0.035850796 -0.001419566  0.037270362
##   16650007  0.178746843  0.036739566  0.142007277
##   16650009 -0.019497922 -0.042810615  0.023312693
## 53612 more rows ...
## 
## $rank
## [1] 3
## 
## $assign
## [1] 1 1 1
## 
## $qr
## $qr
##        control      sPTD     PPROM
## 1 -15.09966887  0.000000  0.000000
## 2   0.06622662 -5.385165  0.000000
## 3   0.06622662  0.000000 -8.306624
## 4   0.06622662  0.000000  0.000000
## 5   0.06622662  0.000000  0.000000
## 321 more rows ...
## 
## $qraux
## [1] 1.066227 1.000000 1.000000
## 
## $pivot
## [1] 1 2 3
## 
## $tol
## [1] 1e-07
## 
## $rank
## [1] 3
## 
## 
## $df.residual
## [1] 323 323 323 323 323
## 53612 more elements ...
## 
## $sigma
##  16650001  16650003  16650005  16650007  16650009 
## 0.5887229 0.6501848 0.7107225 0.6680571 0.3141969 
## 53612 more elements ...
## 
## $cov.coefficients
##          Contrasts
## Contrasts           a            b           c
##         a 0.038868724  0.004385965  0.03448276
##         b 0.004385965  0.018878719 -0.01449275
##         c 0.034482759 -0.014492754  0.04897551
## 
## $stdev.unscaled
##           Contrasts
##                    a         b         c
##   16650001 0.1971515 0.1373998 0.2213041
##   16650003 0.1971515 0.1373998 0.2213041
##   16650005 0.1971515 0.1373998 0.2213041
##   16650007 0.1971515 0.1373998 0.2213041
##   16650009 0.1971515 0.1373998 0.2213041
## 53612 more rows ...
## 
## $pivot
## [1] 1 2 3
## 
## $Amean
## 16650001 16650003 16650005 16650007 16650009 
## 2.091280 3.249681 2.495044 3.774363 1.701569 
## 53612 more elements ...
## 
## $method
## [1] "ls"
## 
## $design
##   control sPTD PPROM
## 1       1    0     0
## 2       1    0     0
## 3       1    0     0
## 4       1    0     0
## 5       1    0     0
## 321 more rows ...
## 
## $contrasts
##          Contrasts
## Levels     a  b  c
##   control -1 -1  0
##   sPTD     1  0  1
##   PPROM    0  1 -1
## 
## $df.prior
## [1] 3.66246
## 
## $s2.prior
## [1] 0.0495584
## 
## $var.prior
## [1] 0.3401842 0.2780299 0.3498472
## 
## $proportion
## [1] 0.01
## 
## $s2.post
##   16650001   16650003   16650005   16650007   16650009 
## 0.34326438 0.41855624 0.50001877 0.44185211 0.09816852 
## 53612 more elements ...
## 
## $t
##           Contrasts
##                      a           b           c
##   16650001 -0.05602414 -0.99268092  0.56641045
##   16650003  0.49837593  0.73142946 -0.01013413
##   16650005  0.25716124 -0.01461087  0.23816666
##   16650007  1.36395416  0.40226227  0.96534520
##   16650009 -0.31564729 -0.99444114  0.33621480
## 53612 more rows ...
## 
## $df.total
## [1] 326.6625 326.6625 326.6625 326.6625 326.6625
## 53612 more elements ...
## 
## $p.value
##           Contrasts
##                    a         b         c
##   16650001 0.9553568 0.3216002 0.5715037
##   16650003 0.6185545 0.4650412 0.9919205
##   16650005 0.7972162 0.9883515 0.8119012
##   16650007 0.1735211 0.6877541 0.3350860
##   16650009 0.7524718 0.3207442 0.7369248
## 53612 more rows ...
## 
## $lods
##           Contrasts
##                    a         b         c
##   16650001 -5.732450 -5.510764 -5.502653
##   16650003 -5.622114 -5.721780 -5.643673
##   16650005 -5.704100 -5.972714 -5.618766
##   16650007 -4.899123 -5.896840 -5.234395
##   16650009 -5.689025 -5.509126 -5.593997
## 53612 more rows ...
## 
## $F
## [1] 0.49833660 0.34161826 0.03469048 0.94708547 0.50673452
## 53612 more elements ...
## 
## $F.p.value
## [1] 0.6080015 0.7108730 0.9659079 0.3889322 0.6029325
## 53612 more elements ...

I will view now the results of the ANOVA in the slots of the data.fit.eb object. The statistic that is calculated in ANOVA is the F-statistic, I may retrieve the F-statistic and its corresponding p-value for each gene in the F and F.p.value slots.

data.fit.eb$F[1:7] 
## [1] 0.49833660 0.34161826 0.03469048 0.94708547 0.50673452 0.43180536 0.66442777
data.fit.eb$F.p.value[1:7]
## [1] 0.6080015 0.7108730 0.9659079 0.3889322 0.6029325 0.6497058 0.5152619

ANOVA is always followed by a series of pairwise comparisons. The t-statistics and the resulting p-values of the pairwise comparisons are stored in the t and p.value slots.

head(data.fit.eb$t)
##           Contrasts
##                      a           b           c
##   16650001 -0.05602414 -0.99268092  0.56641045
##   16650003  0.49837593  0.73142946 -0.01013413
##   16650005  0.25716124 -0.01461087  0.23816666
##   16650007  1.36395416  0.40226227  0.96534520
##   16650009 -0.31564729 -0.99444114  0.33621480
##   16650011  0.92157468  0.03116448  0.80164735
head(data.fit.eb$p.value)
##           Contrasts
##                    a         b         c
##   16650001 0.9553568 0.3216002 0.5715037
##   16650003 0.6185545 0.4650412 0.9919205
##   16650005 0.7972162 0.9883515 0.8119012
##   16650007 0.1735211 0.6877541 0.3350860
##   16650009 0.7524718 0.3207442 0.7369248
##   16650011 0.3574306 0.9751574 0.4233397
data.fit.eb$lods[1:7,]
##           Contrasts
##                    a         b         c
##   16650001 -5.732450 -5.510764 -5.502653
##   16650003 -5.622114 -5.721780 -5.643673
##   16650005 -5.704100 -5.972714 -5.618766
##   16650007 -4.899123 -5.896840 -5.234395
##   16650009 -5.689025 -5.509126 -5.593997
##   16650011 -5.352138 -5.972358 -5.361306
##   16650013 -5.705649 -5.352044 -5.537476

The log fold changes can be found in the coefficients slot. This is what we are interested in.

data.fit.eb$coefficients[1:30,]
##           Contrasts
##                       a            b            c
##   16650001 -0.006471273 -0.079911687  0.073440415
##   16650003  0.063567350  0.065018301 -0.001450951
##   16650005  0.035850796 -0.001419566  0.037270362
##   16650007  0.178746843  0.036739566  0.142007277
##   16650009 -0.019497922 -0.042810615  0.023312693
##   16650011  0.119715868  0.002821417  0.116894451
##   16650013 -0.023355122 -0.074819945  0.051464823
##   16650015 -0.240824828 -0.102760445 -0.138064383
##   16650017 -0.184903489 -0.033737575 -0.151165914
##   16650019 -0.158034338 -0.118865243 -0.039169094
##   16650021  0.010459997  0.070797169 -0.060337172
##   16650023 -0.157196517 -0.096101447 -0.061095070
##   16650025 -0.048758115 -0.016857621 -0.031900493
##   16650027 -0.017225681 -0.133635046  0.116409365
##   16650029 -0.177940735 -0.032063047 -0.145877688
##   16650031 -0.067085091 -0.070443983  0.003358892
##   16650033 -0.006887861 -0.075505054  0.068617193
##   16650035 -0.083017341 -0.057965620 -0.025051721
##   16650037  0.039734685 -0.046162270  0.085896955
##   16650041 -0.138101101  0.071017616 -0.209118718
##   16650043 -0.145276036  0.001155695 -0.146431730
##   16650045 -0.251177989  0.043844131 -0.295022121
##   16650047 -0.177722941 -0.058525331 -0.119197610
##   16650049 -0.045025383 -0.010555072 -0.034470310
##   16650051 -0.193146586  0.028123458 -0.221270044
##   16650053 -0.036932911 -0.019400025 -0.017532886
##   16650055 -0.108543477 -0.019790131 -0.088753346
##   16650057  0.010844757  0.033345551 -0.022500794
##   16650059 -0.098823859  0.031173852 -0.129997711
##   16650061  0.258377464  0.009429214  0.248948250

The best way to decide on the number of DE genes I am going to select is via a Volcano plot. A volcano plot is a graph that allows to simultaneously assess the P values (statistical significance) and log ratios (biological difference) of differential expression for the given genes.

volcanoplot(data.fit.eb, coef = 1, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of sPTD v/s control")

volcanoplot(data.fit.eb, coef = 2, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of PPROM v/s control")

volcanoplot(data.fit.eb, coef = 3, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of sPTD v/s PPROM")

Volcano plots arrange genes along biological and statistical significance.

Finally, I will adjust for multiple testing and defining DE genes. I am doing a t-test on each gene, meaning that I will be doing more than 20000 t-tests on the data set.

3.5.2 Adjusting For Multiple Testing

Since I have 3 groups for the class variable, the decideTests() method will perform multiple testing adjustment on these p-values. Additionally, it will evaluate for each gene whether the results data.fit.eb fulfill the criteria for differential expression that I specify. The adjust.method argument specifies which method is used to adjust the p-values for multiple testing.The value BH means that Benjamini-Hochberg correction will be used. The p.value argument specifies the FDR and the lfc argument specifies the minimal fold change that is required to be considered DE.
The method argument specifies how the p-values are adjusted: global means that all contrasts are considered independent.

DEresults <- decideTests(data.fit.eb,method='global',adjust.method="BH",p.value=0.05,lfc=0.5) 
#method: "global" means all contrasts are considered independent. The method will treat the entire matrix of t-statistics as a single vector of independent tests. It is the simplest and obvious choice if you want to do multiple testing in both directions simultaneously. The p-value cutoff will be consistent across all contrasts.
#adjust.method: "BH" means Benjamini-Hochberg correction or "BY" or "holm".

DEresults <- as.data.frame(DEresults)
colnames(DEresults) <- c("sPTD-control","PPROM-control","sPTD-PPROM")

ups_sPTDversusControl <- DEresults[DEresults$`sPTD-control`==1, ] #up-regulated genes for sPTD v/s control
downs_sPTDversusControl <- DEresults[DEresults$`sPTD-control`==-1, ] #down-regulated genes for sPTD v/s control

ups_PPROMversusControl <- DEresults[DEresults$`PPROM-control`==1, ] #up-regulated genes for PPROM v/s control
downs_PPROMversusControl <- DEresults[DEresults$`PPROM-control`==-1, ] #down-regulated genes for PPROM v/s control

ups_sPTDversusPPROM <- DEresults[DEresults$`sPTD-PPROM`==1, ] #up-regulated genes for sPTD v/s PPROM 
downs_sPTDversusPPROM  <- DEresults[DEresults$`sPTD-PPROM`==-1, ] #down-regulated genes for sPTD v/s PPROM 
sPTD v/s control PPROM v/s control sPTD v/s PPROM
upregulated genes 3 3 1
downregulated genes 1 0 2

Finally, I’ll get the annotations of the probes ids.

Having the gene names, I can finally do GO enrichment and pathway enrichment using some tools and/or databases.
But before that, I’ll going to see if they are any housekeeping genes.

## The number of Housekeeping gene equals to 0
I’ll now move to the analysis of the second type of array, the Affymetrix HTA 2.0 Array.

4 Affymetrix Human Transcriptome Array 2.0 Workflow

This high-resolution Human Transcriptome Array (HTA) 2.0 design contains more than 6 million distinct probes covering coding and non-coding transcripts. 70% of the probes on this array cover exons for coding transcripts, and the remaining 30% of probes on the array cover exon-exon splice junctions and non-coding transcripts. To ensure uniform coverage of the transcriptome, GeneChip HTA 2.0 was designed with approximately ten probes per exon and four probes per exon-exon splice junction. This coverage ensures that I will obtain complete, accurate, and reproducible data with every experiment. In order to make the analysis of this vast amount genetic data seamless, the probes are all arranged into probe sets that translate and summarize my data into gene level and exon level probe sets. GeneChip Human Transcriptome Array 2.0 was designed to aid in human disease research and clinical translational medicine by supporting analysis solutions that take me to biologically meaningful results in days rather than months.

4.1 Open CEL-files From Newer Affymetrix Arrays (HTA)

The list.files() command should be used to obtain the list of CEL files in the folder that was specified by the celpath. Then I will import all the CEL files by a single command using the read.celfiles() method.

celpath <- "~/Desktop/oliver/HTA/"
#import CEL files containing raw probe-level data into an R object
list <- list.files(celpath,full.names=TRUE)
data <- read.celfiles(list)
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_11.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_13.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_14.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_19.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_20.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_25.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_26.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_28.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_29.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_4.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Sample_5.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_006_P1A06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_018_P1B06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_030_P1C06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_041_P1D05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_053_P1E05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_065_P1F05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_106_P2B02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_107_P2C02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_128_P2H04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_133_P2E05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_139_P2C06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_145_P2A07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_153_P2A08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_166_P2F09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_168_P2H09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_190_P2F12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_200_P3H01.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_203_P3C02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_205_P3E02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_206_P3F02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_208_P3H02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_211_P3C03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_231_P3G05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_233_P3A06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_236_P3D06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_241_P3A07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_250_P3B08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_252_P3D08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_255_P3G08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_256_P3H08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_258_P3B09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_259_P3C09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_261_P3E09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_266_P3B10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_267_P3C10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_269_P3E10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_271_P3G10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_314_P4B04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_334_P4F06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_337_P4A07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_359_P4G09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_381_P4E12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_383_P4G12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_412_P5D04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_440_P5H07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_491_P6C02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_492_P6D02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_499_P6C03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_506_P6B04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_542_P6F08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_557_P6E10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_566_P6F11.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_584_P7H01_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_588_P7D02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_591_P7G02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_593_P7A03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_597_P7E03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_600_P7H03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_611_P7C05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_612_P7D05.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_617_P7A06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_618_P7B06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_622_P7F06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_623_P7G06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_640_P7H08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_641_P7A09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_648_P7H09_2.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_650_P7B10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_666_P7B12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_667_P7C12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_672_P7H12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_702_P8F04.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_717_P8E06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_726_P8F07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_739_P8C09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_742_P8F09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_745_P08A10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_751_P8G10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_752_P8H10.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_754_P8B11.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_764_P8D12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_765_P8E12.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_770_P9B01.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_774_P9F01.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_775_P9G01.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_777_P9A02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_782_P9F02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_786_P9B03.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_811_P9C06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_813_P9E06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_820_P9D07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_827_P9C08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_830_P9F08.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_834_P9B09.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_849_P8A11.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_879_P10G02.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_910_P10F06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_911_P10G06.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_916_P10D07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_917_P10E07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_918_P10F07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_919_P10G07.CEL
## Reading in : /Users/fatbobmacpro/Desktop/oliver/HTA//Tarca_920_P10H07.CEL
The data is now a specific FeatureSet object containing the data from my CEL files.

4.2 Data Exploration

4.2.1 Some Initial Statistics

The rows represent the microarray probes while the columns represent one microarray. The expression intensity values are in the assayData sub-object exprs and can be accessed by the exprs() function.


## The number of microarray probes is equal to 6892960 and the number of microarray samples is equal to 115
## The type of the raw_data is an HTAFeatureSet

4.2.2 Retrieving Informations About The Raw Data

How to retrieve intensities of specific rows in the CEL files? There are two methods exprs() and intensity() that can obtain intensity data. Both methods return the same result: a matrix with intensities of all probes. So I am going to used one of them.

int <- oligo::intensity(data)
int[1:10,1:10]
##    Sample_10.CEL Sample_11.CEL Sample_13.CEL Sample_14.CEL Sample_19.CEL
## 1           4630          5624          4803          1111          4171
## 2            130           170           179            56           113
## 3           4305          5204          4394          1020          3941
## 4            100           100           110            49            77
## 5             66            79            51            35            66
## 6             43            42            35            33            33
## 7             72           109            57            47            99
## 8            212           310           134            53           232
## 9             49            50            30            33            47
## 10            65            87            38            35            66
##    Sample_20.CEL Sample_25.CEL Sample_26.CEL Sample_28.CEL Sample_29.CEL
## 1           4984          5277          5401          5174          4669
## 2            161           153           167           194           151
## 3           4906          4877          5191          4729          4165
## 4            111           143           145            95            87
## 5             51            60            67            74            50
## 6             34            47            44            43            39
## 7            128           133           161            86           104
## 8            360           307           289           305           266
## 9             57            50            45            58            57
## 10            66            67            35            62            70

How to retrieve intensities of PM probes of specific rows in the CEL files? Since I am only working with PM probes, I might want to look at them using the pm() method.

pm <- oligo::pm(data)
pm[1:10,1:10]
##    Sample_10.CEL Sample_11.CEL Sample_13.CEL Sample_14.CEL Sample_19.CEL
## 6             43            42            35            33            33
## 7             72           109            57            47            99
## 8            212           310           134            53           232
## 9             49            50            30            33            47
## 10            65            87            38            35            66
## 11            48            46            34            34            46
## 12           110           173           115            54           130
## 13           280           342           125            46           224
## 14            57           103            59            26            85
## 15           133           135            57            36            85
##    Sample_20.CEL Sample_25.CEL Sample_26.CEL Sample_28.CEL Sample_29.CEL
## 6             34            47            44            43            39
## 7            128           133           161            86           104
## 8            360           307           289           305           266
## 9             57            50            45            58            57
## 10            66            67            35            62            70
## 11            66            35            48            68            79
## 12           210           204           156           194           208
## 13           443           355           277           470           352
## 14           121            79            98            89           102
## 15           168           142           111           125           119

Apart from the expression data itself, microarray data sets need to include information about the samples that were hybridized to the arrays. One of them is called phenoData. It contains labels for the samples. However, for most data sets the phenoData has not been defined. How to retrieve the sample annotation of the data?

ph <- data@phenoData; ph
## An object of class 'AnnotatedDataFrame'
##   rowNames: Sample_10.CEL Sample_11.CEL ... Tarca_920_P10H07.CEL (115
##     total)
##   varLabels: index
##   varMetadata: labelDescription channel
I’ll finally retrieve the first few and last few IDs of the probe sets that are represented on the arrays.
head(featureNames(data))
## [1] "1" "2" "3" "4" "5" "6"
tail(featureNames(data))
## [1] "6892955" "6892956" "6892957" "6892958" "6892959" "6892960"

4.2.3 Checking Missing Values

I’ll check for the presence of several types of missing values in this HTAFeatureSet.
NA_values <- which(is.na(Biobase::exprs(data)), arr.ind=T)

NaN_values <- which(apply(Biobase::exprs(data), 2, function(x) all(is.nan(x))))

infinite_values <- which(apply(Biobase::exprs(data), 2, function(x) all(is.infinite(x))))

blank_values <- function (x) {sum(x=="") }
bvalues <- apply(Biobase::exprs(data), 2,blank_values); bvalues<-as.character(bvalues);count<-0
for(index in 1:length(bvalues)){
  if(bvalues[index]!=0){
    count=count+1 } }
This table summarizes the different of missing values count.
Count
NA values 0
NaN values 0
Infinite values 0
Blank values 0

4.3 Quality Check

Since the phenoData object, that was created in the step where I retrieved the sample annotation, does not contain any information, Bioconductor will just give the CEL-files an index 1-115. However, the phenoData will be used as labels in plots. I am going to give the samples more accurate names so they can be used in the plots that I am going to create.

ph@data[,1] <- c("Control1","Control2","Control3","Control4","Control5","Control6","Control7","Control8","Control9","Control10","Control11","Control12","Control13","Control14","Control15","Control16","Control17","Control18","Control19","Control20","Control21","Control22","Control23","Control24","Control25","Control26","Control27","Control28","Control29","Control30","Control31","Control32","Control33","Control34","Control35","Control36","Control37","Control38","Control39","Control40","Control41","Control42","Control43","Control44","Control45","Control46","Control47","Control48","Control49","Control50","Control51","Control52","Control53","Control54","Control55","Control56","Control57","Control58","Control59","PPROM1","PPROM2","PPROM3","PPROM4","PPROM5","PPROM6","PPROM7","PPROM8","PPROM9","PPROM10","PPROM11","PPROM12","PPROM13","PPROM14","PPROM15","PPROM16","PPROM17","PPROM18","PPROM19","PPROM20","PPROM21","PPROM22","PPROM23","PPROM24","PPROM25","PPROM26","PPROM27","PPROM28","PPROM29","sPTD1","sPTD2","sPTD3","sPTD4","sPTD5","sPTD6","sPTD7","sPTD8","sPTD9","sPTD10","sPTD11","sPTD12","sPTD13","sPTD14","sPTD15","sPTD16","sPTD17","sPTD18","sPTD19","sPTD20","sPTD21","sPTD22","sPTD23","sPTD24","sPTD25","sPTD26","sPTD27"); ph
## An object of class 'AnnotatedDataFrame'
##   rowNames: Sample_10.CEL Sample_11.CEL ... Tarca_920_P10H07.CEL (115
##     total)
##   varLabels: index
##   varMetadata: labelDescription channel

It’s time to create some plots to assess the quality of the data.

The picture of a microarray can show large inconsistencies on an individual array. How to print the raw intensities of a microarray?

image(data[,1], main=ph@data$sample[1])

Another quality control check is to plot boxplot for first few arrays. This latter is a standardized way of displaying the dataset based on a five-number summary: the minimum, the maximum, the sample median, and the first and third quartiles.

oligo::boxplot(data,ylim = c(0,9),target = "core", main = "Boxplot of log2-intensitites for the raw data",las=2,names=c("Control1","Control2","Control3","Control4","Control5","Control6","Control7","Control8","Control9","Control10","Control11","Control12","Control13","Control14","Control15","Control16","Control17","Control18","Control19","Control20","Control21","Control22","Control23","Control24","Control25","Control26","Control27","Control28","Control29","Control30","Control31","Control32","Control33","Control34","Control35","Control36","Control37","Control38","Control39","Control40","Control41","Control42","Control43","Control44","Control45","Control46","Control47","Control48","Control49","Control50","Control51","Control52","Control53","Control54","Control55","Control56","Control57","Control58","Control59","PPROM1","PPROM2","PPROM3","PPROM4","PPROM5","PPROM6","PPROM7","PPROM8","PPROM9","PPROM10","PPROM11","PPROM12","PPROM13","PPROM14","PPROM15","PPROM16","PPROM17","PPROM18","PPROM19","PPROM20","PPROM21","PPROM22","PPROM23","PPROM24","PPROM25","PPROM26","PPROM27","PPROM28","PPROM29","sPTD1","sPTD2","sPTD3","sPTD4","sPTD5","sPTD6","sPTD7","sPTD8","sPTD9","sPTD10","sPTD11","sPTD12","sPTD13","sPTD14","sPTD15","sPTD16","sPTD17","sPTD18","sPTD19","sPTD20","sPTD21","sPTD22","sPTD23","sPTD24","sPTD25","sPTD26","sPTD27"),col=c("red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue"))

When I look at the boxplot, I see that the intensity distributions of the individual arrays are quite different, indicating the need for an appropriate normalization.

A third quality control is the creation of density estimate for few samples.

4.4 Data Normalization

4.4.1 Robust Multi-array Average Algorithm

The standard method for normalization is RMA, which is one of the few normalization methods that only uses the PM probes.

data.rma <- oligo::rma(data)
## Background correcting
## Normalizing
## Calculating Expression
data.matrix <- Biobase::exprs(data.rma)
Normalization can be also done using the GCRMA algorithm.

4.4.2 Checking The Effect Of The Normalization

After doing normalization, I will need to re-visualize the normalized data. For that, I’ll plot boxplots for some microarrays.

These boxplots are now aligned, showing that the intensity distributions of the individual microarrays are quite similar.

4.4.3 About The Principal Component Analysis

I will now perform a Principal Component Analysis (PCA) in order to check whether the overall variability of the samples reflects their grouping. But before that, let’s see what is PCA and what does it performs?

PCA is a standard technique for visualizing high dimensional data and for data pre-processing. PCA reduces the dimensionality (the number of variables) of a data set by maintaining as much variance as possible.

4.4.3.1 Dimensionality Reduction

Low variance can often be assumed to represent undesired background noise. The dimensionality of the data can therefore be reduced, without loss of relevant information, by extracting a lower dimensional component space covering the highest variance.

4.4.3.2 PCA & Bioinformatics

Illustrated are three-dimensional gene expression data which are mainly located within a two-dimensional subspace. PCA is used to visualize these data by reducing the dimensionality of the data: the three original variables (genes) are reduced to a lower number of two new variables termed principal components (PCs). Such two-dimensional visualization of the samples allow us to draw qualitative conclusions about the separability of experimental conditions (marked by different colors).

Legend:
Left side: I can identify the two-dimensional plane that optimally describes the highest variance of the data.
Right side: This two-dimensional subspace can then be rotated and presented as a two-dimensional component space.
Class:Color
Control:red
PPROM:green
sPTD:blue

I’ll now create a PCA plot using the prcomp() method.

color<-c("red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","red","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","green","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue","blue")

data.PC <- prcomp(t(data.matrix),scale.=TRUE)
#t: transpose the element
#sacle.: a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is FALSE for consistency with S, but in general scaling is advisable. Alternatively, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.

plot(data.PC$x[1:115],col=color,ylab="PC1")

Since I have three levels in my data, this leads to a clear distinction between these groups. Not only PCA plots, but histograms also show us the comparison between the raw and the normalized data.


4.5 Differential Gene Expression

The identification of DE genes is not done by the affy nor the oligo package but by the limma package. Limma uses the output of the rma() method (data.rma) as input.

4.5.1 Three groups of samples

As an example I will compare spontaneous preterm labor and delivery with intact membranes (sPTD) and preterm premature rupture of the membranes (PPROM) to a set of Control women.

I first need to tell limma which samples are replicates and which samples belong to different groups by providing this information in the phenoData slot of the HTAFeatureSet. To this end, I will add a second column with sample annotation describing the source of each sample. I will then give this new column a name.

ph@data[ ,2] <-c("Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","PPROM","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD","sPTD")

colnames(ph@data)[2] <- "level"

So, the factor that determines the grouping will have 3 levels.

groups <- ph@data$level
f <- factor(groups,levels = c("Control","sPTD","PPROM"))

Then, I need to create a design matrix. ANOVA needs such a matrix to know which samples belong to which group. Since limma performs an ANOVA, it needs such a design matrix. I will create it using the model.matrix() method.

design <- model.matrix(~ 0 + f) 
colnames(design) <- c("Control","sPTD","PPROM")

#Fit linear model for each gene given a series of arrays
data.fit <- lmFit(object = data.rma, design = design) 

Afterwards, I need to tell limma which groups I want to compare. For this I define a contrast matrix defining the contrasts (comparisons) of interest by using the makeContrasts() method.

contrast.matrix <- makeContrasts(a=sPTD-Control,b=PPROM-Control,c=sPTD-PPROM,levels=design)
data.fit.con <- contrasts.fit(data.fit,contrast.matrix)

data.fit.eb <- eBayes(data.fit.con)
data.fit.eb
## An object of class "MArrayLM"
## $coefficients
##             Contrasts
##                      a           b          c
##   2824546_st 0.1957635 -0.01687591 0.21263942
##   2824549_st 0.2927558  0.15484979 0.13790597
##   2824551_st 0.2634952  0.18959655 0.07389866
##   2824554_st 0.3362027  0.28500707 0.05119561
##   2827992_st 0.5283583  0.49529253 0.03306574
## 70518 more rows ...
## 
## $rank
## [1] 3
## 
## $assign
## [1] 1 1 1
## 
## $qr
## $qr
##      Control      sPTD     PPROM
## 1 -7.6811457  0.000000  0.000000
## 2  0.1301889 -5.196152  0.000000
## 3  0.1301889  0.000000 -5.385165
## 4  0.1301889  0.000000  0.000000
## 5  0.1301889  0.000000  0.000000
## 110 more rows ...
## 
## $qraux
## [1] 1.130189 1.000000 1.000000
## 
## $pivot
## [1] 1 2 3
## 
## $tol
## [1] 1e-07
## 
## $rank
## [1] 3
## 
## 
## $df.residual
## [1] 112 112 112 112 112
## 70518 more elements ...
## 
## $sigma
## 2824546_st 2824549_st 2824551_st 2824554_st 2827992_st 
##  0.9062726  0.6663917  0.9200431  1.1218266  0.8486486 
## 70518 more elements ...
## 
## $cov.coefficients
##          Contrasts
## Contrasts          a           b           c
##         a 0.05398619  0.01694915  0.03703704
##         b 0.01694915  0.05143191 -0.03448276
##         c 0.03703704 -0.03448276  0.07151980
## 
## $stdev.unscaled
##             Contrasts
##                      a        b         c
##   2824546_st 0.2323493 0.226786 0.2674319
##   2824549_st 0.2323493 0.226786 0.2674319
##   2824551_st 0.2323493 0.226786 0.2674319
##   2824554_st 0.2323493 0.226786 0.2674319
##   2827992_st 0.2323493 0.226786 0.2674319
## 70518 more rows ...
## 
## $pivot
## [1] 1 2 3
## 
## $Amean
## 2824546_st 2824549_st 2824551_st 2824554_st 2827992_st 
##   9.356964   9.038516   8.745541   8.576868   8.669450 
## 70518 more elements ...
## 
## $method
## [1] "ls"
## 
## $design
##   Control sPTD PPROM
## 1       1    0     0
## 2       1    0     0
## 3       1    0     0
## 4       1    0     0
## 5       1    0     0
## 110 more rows ...
## 
## $contrasts
##          Contrasts
## Levels     a  b  c
##   Control -1 -1  0
##   sPTD     1  0  1
##   PPROM    0  1 -1
## 
## $df.prior
## [1] 3.655714
## 
## $s2.prior
## [1] 0.02530346
## 
## $var.prior
## [1] 1.7649239 1.0304117 0.9079151
## 
## $proportion
## [1] 0.01
## 
## $s2.post
## 2824546_st 2824549_st 2824551_st 2824554_st 2827992_st 
##  0.7961687  0.4308411  0.8205231  1.2195156  0.6982396 
## 70518 more elements ...
## 
## $t
##             Contrasts
##                      a          b         c
##   2824546_st 0.9442519 -0.0833966 0.8911034
##   2824549_st 1.9195771  1.0402452 0.7856179
##   2824551_st 1.2519471  0.9229299 0.3050549
##   2824554_st 1.3102862  1.1380088 0.1733508
##   2827992_st 2.7213535  2.6136247 0.1479663
## 70518 more rows ...
## 
## $df.total
## [1] 115.6557 115.6557 115.6557 115.6557 115.6557
## 70518 more elements ...
## 
## $p.value
##             Contrasts
##                        a          b         c
##   2824546_st 0.347009380 0.93368036 0.3747238
##   2824549_st 0.057374816 0.30039573 0.4336979
##   2824551_st 0.213115114 0.35796516 0.7608726
##   2824554_st 0.192695327 0.25746876 0.8626787
##   2827992_st 0.007506803 0.01014979 0.8826270
## 70518 more rows ...
## 
## $lods
##             Contrasts
##                      a         b         c
##   2824546_st -5.919161 -6.114861 -5.533759
##   2824549_st -4.579553 -5.600944 -5.615907
##   2824551_st -5.592055 -5.710613 -5.860135
##   2824554_st -5.519962 -5.499742 -5.889574
##   2827992_st -2.844309 -2.934606 -5.893386
## 70518 more rows ...
## 
## $F
## [1] 0.5293839 1.9420843 0.9346243 1.1447720 5.3880558
## 70518 more elements ...
## 
## $F.p.value
## [1] 0.590387938 0.148052969 0.395679018 0.321876206 0.005790017
## 70518 more elements ...

I will view now the results of the ANOVA in the slots of the data.fit.eb object. The statistic that is calculated in ANOVA is the F-statistic, I may retrieve the F-statistic and its corresponding p-value for each gene in the F and F.p.value slots.

head(data.fit.eb$F)
## [1] 0.5293839 1.9420843 0.9346243 1.1447720 5.3880558 2.4682771
head(data.fit.eb$F.p.value)
## [1] 0.590387938 0.148052969 0.395679018 0.321876206 0.005790017 0.089184085

ANOVA is always followed by a series of pairwise comparisons. The t-statistics and the resulting p-values of the pairwise comparisons are stored in the t and p.value slots.

data.fit.eb$t[1:10,]
##             Contrasts
##                       a          b           c
##   2824546_st  0.9442519 -0.0833966  0.89110339
##   2824549_st  1.9195771  1.0402452  0.78561791
##   2824551_st  1.2519471  0.9229299  0.30505491
##   2824554_st  1.3102862  1.1380088  0.17335082
##   2827992_st  2.7213535  2.6136247  0.14796633
##   2827995_st  1.7580437  1.8518982 -0.04301835
##   2827996_st  2.6059483  2.9737162 -0.25766239
##   2828010_st  2.3791492  3.4213779 -0.83433278
##   2828012_st -0.2489306  0.5421243 -0.67600429
##   2835442_st  1.9643820  1.4253368  0.49798202
data.fit.eb$p.value[1:10,]
##             Contrasts
##                        a            b         c
##   2824546_st 0.347009380 0.9336803630 0.3747238
##   2824549_st 0.057374816 0.3003957335 0.4336979
##   2824551_st 0.213115114 0.3579651610 0.7608726
##   2824554_st 0.192695327 0.2574687581 0.8626787
##   2827992_st 0.007506803 0.0101497910 0.8826270
##   2827995_st 0.081385543 0.0665907706 0.9657611
##   2827996_st 0.010366893 0.0035814744 0.7971254
##   2828010_st 0.018989804 0.0008619341 0.4058135
##   2828012_st 0.803855965 0.5887758773 0.5003875
##   2835442_st 0.051884453 0.1567530869 0.6194423
data.fit.eb$lods[1:10,]
##             Contrasts
##                      a          b         c
##   2824546_st -5.919161 -6.1148607 -5.533759
##   2824549_st -4.579553 -5.6009443 -5.615907
##   2824551_st -5.592055 -5.7106125 -5.860135
##   2824554_st -5.519962 -5.4997425 -5.889574
##   2827992_st -2.844309 -2.9346059 -5.893386
##   2827995_st -4.861740 -4.4959277 -5.902755
##   2827996_st -3.127251 -2.0323862 -5.872593
##   2828010_st -3.651428 -0.7746806 -5.579238
##   2828012_st -6.323437 -5.9772155 -5.690435
##   2835442_st -4.497164 -5.1512097 -5.787821
The log fold changes can be found in the coefficients slot. This is what we are interested in.
data.fit.eb$coefficients[1:30,]
##             Contrasts
##                         a            b           c
##   2824546_st  0.195763508 -0.016875912  0.21263942
##   2824549_st  0.292755760  0.154849791  0.13790597
##   2824551_st  0.263495211  0.189596554  0.07389866
##   2824554_st  0.336202679  0.285007071  0.05119561
##   2827992_st  0.528358275  0.495292534  0.03306574
##   2827995_st  0.394706202  0.405822747 -0.01111654
##   2827996_st  0.639348606  0.712108951 -0.07276034
##   2828010_st  0.629716499  0.883892470 -0.25417597
##   2828012_st -0.073093633  0.155372834 -0.22846647
##   2835442_st  0.491023025  0.347751010  0.14327202
##   2835447_st  0.268143915 -0.026855591  0.29499951
##   2835453_st  0.728764374 -0.742317167  1.47108154
##   2835456_st  0.259359766  0.279606306 -0.02024654
##   2835459_st  0.094126933  0.172027392 -0.07790046
##   2835461_st  0.271307069  0.259616677  0.01169039
##   2839509_st  0.152366626  0.035537794  0.11682883
##   2839511_st  0.185356297  0.115081206  0.07027509
##   2839513_st  0.162886537  0.192477172 -0.02959064
##   2839515_st  0.011226634 -0.019332259  0.03055889
##   2839517_st  0.026322657  0.062023632 -0.03570098
##   2839524_st  0.258321978  0.197476333  0.06084565
##   2839528_st  0.427048180  0.309238227  0.11780995
##   2839532_st  0.258582841  0.290925441 -0.03234260
##   2839538_st  0.006356328 -0.020058561  0.02641489
##   2839539_st -0.213419168 -0.240033563  0.02661440
##   2858288_st  0.092428444 -0.115961172  0.20838962
##   2886354_st  0.082782659  0.009741304  0.07304136
##   2886356_st  0.204621528  0.068518174  0.13610335
##   2886364_st  0.354728838  0.148670865  0.20605797
##   2886370_st  0.247361073 -0.005191360  0.25255243

4.5.2 Drawing Volcano Plot

The best way to decide on the number of DGE I am going to select is via a volcano plot. I want to find genes that are DE between asymptomatic women and the highlight parameter allows to specify the number of highest scoring genes for which names will be attached on the plot.

volcanoplot(data.fit.eb, coef = 1, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of sPTD v/s control")

volcanoplot(data.fit.eb, coef = 2, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of PPROM v/s control")

volcanoplot(data.fit.eb, coef = 3, highlight = 10,xlim=c(-2,2),ylim=c(0,7),main="Volcano Plot of sPTD v/s PPROM")

Volcano plots arrange genes along biological and statistical significance. The X-axis gives the log fold change between the two groups, and the Y-axis represents the p-value of a t-test comparing samples. Hence, the first axis indicates biological impact of the change; the second indicates the statistical evidence of the change.

Finally, I am doing a t-test on each gene, meaning that I will be doing more than 20000 t-tests on the data set. I have to adjust the p-values of the t-tests for multiple testing. Of course my final aim is to generate the DE genes (the genes with the lowest adjusted p-values and the most extreme log fold changes). I will then use the IDs in order to search for functional relations between the genes.

4.5.3 Adjusting For Multiple Testing

Since I have 3 groups for the class variable, the decideTests() method will perform multiple testing adjustment on these p-values. Additionally, it will evaluate for each gene whether the results data.fit.eb fulfill the criteria for differential expression that I specify. The adjust.method argument specifies which method is used to adjust the p-values for multiple testing.The value BH means that Benjamini-Hochberg correction will be used. The p.value argument specifies the FDR and the lfc argument specifies the minimal fold change that is required to be considered DE.

DEresults <- decideTests(data.fit.eb,method='global',adjust.method="BH",p.value=0.05,lfc=0.7) 
#adjust.method: character string specifying p-value adjustment method. Possible values are "none", "BH", "fdr" (equivalent to "BH"), "BY" and "holm".

DEresults <- as.data.frame(DEresults)
colnames(DEresults) <- c("sPTD-control","PPROM-control","sPTD-PPROM")

ups_sPTDversusControl <- DEresults[DEresults$`sPTD-control`==1, ] #up-regulated genes for sPTD v/s control
downs_sPTDversusControl <- DEresults[DEresults$`sPTD-control`==-1, ] #down-regulated genes for sPTD v/s control

ups_PPROMversusControl <- DEresults[DEresults$`PPROM-control`==1, ] #up-regulated genes for PPROM v/s control
downs_PPROMversusControl <- DEresults[DEresults$`PPROM-control`==-1, ] #down-regulated genes for PPROM v/s control

ups_sPTDversusPPROM <- DEresults[DEresults$`sPTD-PPROM`==1, ] #up-regulated genes for sPTD v/s PPROM 
downs_sPTDversusPPROM  <- DEresults[DEresults$`sPTD-PPROM`==-1, ] #down-regulated genes for sPTD v/s PPROM 
sPTD v/s control PPROM v/s control sPTD v/s PPROM
upregulated genes 33 16 7
downregulated genes 41 14 14

Finally, I’ll get the annotations of the probes ids.

GPL17586.45144.4 <- read.delim("~/Downloads/GPL17586-45144-4.txt", comment.char="#")

index1 <- c()
v1 <- (rownames(ups_sPTDversusControl))
for(a in v1){ index1 <- c(index1, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==a, ])) }
write.table(GPL17586.45144.4 [ index1 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /upregulated_sPTDversusControl.xls",col.names=NA,sep="\t",quote=F)

index2 <- c()
v2 <- (rownames(downs_sPTDversusControl))
for(b in v2){ index2 <- c(index2, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==b, ])) }
write.table(GPL17586.45144.4 [ index2 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /downregulated_sPTDversusControl.xls",col.names=NA,sep="\t",quote=F)

index3 <- c()
v3 <- (rownames(ups_PPROMversusControl))
for(c in v3){ index3 <- c(index3, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==c, ])) }
write.table(GPL17586.45144.4 [ index3 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /upregulated_PPROMversusControl.xls",col.names=NA,sep="\t",quote=F)

index4 <- c()
v4 <- (rownames(downs_PPROMversusControl))
for(d in v4){ index4 <- c(index4, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==d, ])) }
write.table(GPL17586.45144.4 [ index4 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /upregulated_sPTDversusPPROM.xls",col.names=NA,sep="\t",quote=F)

index5 <- c()
v5 <- (rownames(downs_sPTDversusPPROM))
for(e in v5){ index5 <- c(index5, rownames(GPL17586.45144.4[ GPL17586.45144.4$ID==e, ])) }
write.table(GPL17586.45144.4 [ index5 , c(1,2,3,4,5,6,8) ], file = "~/Desktop/microarray_analysis/Human Transcriptome Array 2.0 /downregulated_sPTDversusPPROM.xls",col.names=NA,sep="\t",quote=F)

Having the gene names, I can finally do some enrichment analysis such as network analysis and pathways.
But before that, I’ll going to see if they are any housekeeping genes.

## The number of Housekeeping gene equals to 2809
This brings us to the end of the workflow for differential gene expression using Affymetrix microarrays.